Download ontology design patterns for the formalisation of biological ontologies

Document related concepts

Genomics wikipedia , lookup

Gene therapy wikipedia , lookup

Gene nomenclature wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene expression programming wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

NEDD9 wikipedia , lookup

Transcript
ONTOLOGY DESIGN PATTERNS
FOR THE FORMALISATION OF
BIOLOGICAL ONTOLOGIES
A
U NIVERSITY OF M ANCHESTER
M ASTER OF P HILOSOPHY
REPORT SUBMITTED TO THE
FOR THE DEGREE OF
IN THE
FACULTY
OF
E NGINEERING
AND
P HYSICAL S CIENCES
2005
By
Mikel Egaña Aranguren
Department of Computer Science
Contents
Abstract
5
1 Introduction
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
1.2
1.3
Research hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . .
Report outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
12
2 Background
2.1 Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
13
2.1.1
2.1.2
2.2
Introduction to ontologies . . . . . . . . . . . . . . . . . . .
Web Ontology Language . . . . . . . . . . . . . . . . . . . .
13
14
2.1.2.1
2.1.2.2
14
Introduction to Web Ontology Language . . . . . .
Summary of Web Ontology Language technical properties . . . . . . . . . . . . . . . . . . . . . . . . .
15
Bio-ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Open Biomedical Ontologies . . . . . . . . . . . . . . . . . .
17
18
2.2.1.1 The Gene Ontology . . . . . . . . . . . . . . . . .
2.2.1.2 Other OBO ontologies . . . . . . . . . . . . . . . .
Other biomedical ontologies outside OBO . . . . . . . . . . .
19
22
22
2.2.2
3 Formalising knowledge in bio-ontologies: rationale and previous work
3.1
3.2
23
The need for formalised bio-ontologies: advantages of OWL DL and
problems of OBO . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.1.1
3.1.2
3.1.3
Integration of different ontologies . . . . . . . . . . . . . . .
Automatic maintenance of multiple inheritance . . . . . . . .
Not clear semantics: lack of expressivity and formality . . . .
25
25
26
Gene Ontology Next Generation (GONG) and Biological Ontology
Next Generation (BONG) . . . . . . . . . . . . . . . . . . . . . . . .
28
2
4 Formalising knowledge in bio-ontologies: Ontology Design Patterns
33
4.1
4.2
Introduction to Ontology Design Patterns . . . . . . . . . . . . . . .
Documenting Ontology Design Patterns . . . . . . . . . . . . . . . .
33
35
4.3
4.2.1 Description template of Software Design Patterns . . . . . . .
4.2.2 Description template of Ontology Design Patterns . . . . . .
Ontology Design Patterns explored so far . . . . . . . . . . . . . . .
35
36
39
4.3.1
Extensional ODPs . . . . . . . . . . . . . . . . . . . . . . .
4.3.1.1 N-ary Relationships . . . . . . . . . . . . . . . . .
39
39
4.3.1.2 Exception . . . . . . . . . . . . . . . . . . . . . .
Good practice ODPs . . . . . . . . . . . . . . . . . . . . . .
4.3.2.1 Normalisation . . . . . . . . . . . . . . . . . . . .
41
44
44
4.3.2.2
4.3.2.3
Value Partition . . . . . . . . . . . . . . . . . . . .
Upper Level Ontology . . . . . . . . . . . . . . . .
48
51
Modelling ODPs . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3.1 List . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3.2 Adapted SEP triples . . . . . . . . . . . . . . . . .
52
52
55
4.3.2
4.3.3
5 Conclusion
5.1
5.2
Research hypothesis revisited and extended: research aims, objectives
and questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 GONG and BONG . . . . . . . . . . . . . . . . . . . . . . .
5.2.2 Integration of ODPs in BONG . . . . . . . . . . . . . . . . .
60
60
61
5.2.3
ODPs catalog . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2.3.1 Properties of the biological knowledge domain . . .
61
62
5.2.3.2 Ontological constructs for ODPs . . . . . . . . . .
Documenting ODPs . . . . . . . . . . . . . . . . . . . . . .
Improved bio-ontologies . . . . . . . . . . . . . . . . . . . .
63
63
64
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Research plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
65
5.2.4
5.2.5
5.3
5.4
59
Bibliography
67
3
List of Figures
1.1
Simplified small example ontology . . . . . . . . . . . . . . . . . . .
8
1.2
Ontology Design Pattern applied to the simplified small example ontology of Figure 1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
3.1
Position of the term polarisome in GO . . . . . . . . . . . . . . . .
29
3.2
Functional and chemical classification in metabolism for the term
acetylcholine biosynthesis . . . . . . . . . . . . . . . . . . . .
30
4.1
Simple mapping of OWL to UML . . . . . . . . . . . . . . . . . . .
38
5.1
Research plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
4
Abstract
Bioinformatics manages the information that has been gathered in databases since the
advent of the molecular biology technological revolution. The successful research is
based in interpretations of that information that can be accessed and managed computationally and efficiently, which is a difficult task considering that there are too many
Bioinformatics resources and the resources are not integrated efficiently. An attempt to
solve that problem is to use Ontologies. Ontologies are computational formalisations
of the knowledge about a given domain, allowing computers to manage the information
in a semantic level.
The most successful ontologies applied in Bioinformatics are the ones in the Open
Biomedical Ontologies (OBO) project. Most of the OBO ontologies are very simple
and intuitive but lack formality and expressivity.
The Web Ontology Language (OWL) is a official proposal for ontologies implementation in the semantic web. Its OWL DL variant is grounded in a sound formalism
(Description Logics) and it is very expressive. Implicit knowledge can be made explicit and the consistency of the ontology can be checked automatically in OWL DL
ontologies. There are technics that aid in the building of OWL DL ontologies such as
the application of Ontology Design Patterns (ODPs), which are formalised abstractions
of modelling solutions that can be applied to different ontologies.
The hypothesis of this research is that by providing biologists tools and methods
like ODPs the migration from the OBO language to OWL DL and the creation of
OWL DL ontologies can be done with ease. This will produce more maintainable
and expressive ontologies where more complex queries can be done and the biological knowledge is represented with higher fidelity. This research aims to explore the
application, documentation and usage of ODPs in the context of previous successful
attempts to formalise GO such as the Gene Ontology Next Generation project.
This document gives and overview of the research field, it explores the preliminary
work in ODPs and provides the research plan for the following two years.
5
Chapter 1
Introduction
This chapter gives an overview of the problem that this research will try to solve and
how will it be solved. The problem and its context is explained in section 1.1. The
proposed solution is given in section 1.2 and finally the structure of the whole report is
explained in section 1.3.
1.1 Context
Bioinformatics is the discipline that deals with the information and knowledge created
around the technical revolution that has been happening in molecular biology in the last
25 years. As a consequence of that revolution, large amounts of complex information
are being created and stored. This information is very heterogeneous, including data
that range from plain DNA sequences to complex protein 3D structures. All those data
are annotated, including information about origin, reliability, structural interpretation,
possible drug targets, etc.
The first Bioinformatics resources that stored those data were built on flat files, and
then in relational databases. These storage methods are still used nowadays and as a
result massive hyperlinking is the main way of resources integration. This large resource offer for the biologist is around 700 databases [Gal05] and is even larger when
analysis tools are taken into account. Despite the growing amount of available information [Tho03], the knowledge (useful, describable and computationally manageable
information) related to it is not growing at the same pace, so the biologist is usually left
alone in a sea of unmanageable and thus less useful data [DBD+ 04, SK05, MTES05]
because of the large number of available resources and different formats.
One way of tackling that problem is by making computers able to logically manage
6
the semantics of that information; resources can be better integrated, many tedious and
error-prone processes can be automated and new knowledge regarding the field can be
unleashed in an automatic and more efficient way [BMM05].
The biggest example of this strategy is the semantic web1 [BLHL01]. The semantic
web is a means to build a World Wide Web where the semantic content is accessible for
computers, not just for the human users. One of the main components of the semantic
web are ontologies. Ontologies are models that represent knowledge about a domain
in a computable way. In the semantic web ontologies are the mechanism for providing
a vocabulary that will describe data held in a common data model. The vocabulary and
the semantics provided by the ontology all facilitate machine processing. Ontologies
are usually collections of classes, each class being a group of individuals, where the
classes are linked by different logical relationships, creating a structure like the one
shown in Figure 1.1. The semantic meaning of a class is given by its position in the
structure: the lines are logical relationships that connect that class with others; a class
can be a subclass of other class, a part of other class, and other logical relationships.
One of the main aims for an ontology is to create a shared understanding. All the
people using an ontology to annotate and query resources use the same terms in the
same manner. This shared understanding can be extended to computers. Whilst not
having the same understanding as a human, the computer can make inferences about
the symbols themselves. By enabling a computer to do more sophisticated processing,
it is possible to gain more added value from the process of annotating data with terms
from an ontology.
Ontologies can be created in different Knowledge Representation (KR) languages.
These languages differ in their expressivity: the more expressive a language is, the
more complex can be the knowledge represented by the ontology. Expressivity comes
with a cost: as more expressive a KR language is, less tractable is by computational
methods. The KR language can reach a point in expressivity where the logic that it
maps to is said to be undecidable; as a consequence the computational tools will not
be able to give a result when operating with the ontology.
One of the most widely used KR languages is the Web Ontology Language2 (OWL).
OWL is divided in three variants, depending on the expressivity: OWL Lite, OWL DL
and OWL Full. OWL DL, which is the focus of this research, maps to Description
Logics (DL). DL languages are a decidable fragment of First Order Logic (FOL). The
1 http://www.w3.org/2001/sw/
2 http://www.w3.org/2004/OWL/
7
Figure 1.1:
Simplified small example ontology. This invented trivial ontology represents knowledge
about a hypothetical society where all the women are mothers and lawyers, whereas all the
men are fathers but they are unemployed. There are different relationships, each of them with
different logical characteristics: Is A, Part Of and Has Parent. For example Is A means that
a class is a more specific subclass of another class (mother is a type of person), whereas Part
Of means that a class is a constitutive element of some other class (this society is build upon
infrastructures and people, but a infrastructure is not a type of society). Has Parent only relates
people to their parents, so a child will have two Has Parent relationships. These relationships
link classes, creating the semantic meaning of a class by the position it has in the structure: child
is defined by being a type of person and having two parents: a lawyer mother and an unemployed father. The string of characters that forms each class-name is completely meaningless
except for human users: a child is a child because of its position in the structure (anything that
has a lawyer mother, an unemployed father and is a person), not because of the string child.
Therefore the class child represents the group of individuals that fulfil those conditions.
Description Logics that relate to OWL DL are very expressive and formal, allowing
reasoning: a program called a reasoner can compute the structure of the ontology and
check the logical consistency of that ontology, amongst other things.
8
The biggest example of ontology usage in Bioinformatics is the Gene Ontology3
(GO), which is part of the Open Biomedical Ontologies4 (OBO) project. GO has three
independent subontologies, describing the molecular function, cellular component and
biological process of gene products of other biological databases. The structure of
GO is simple: the terms5 are related by two types of relationships, IsA and PartOf,
creating a tree-like structure. GO allows for the annotation of gene products with GO
terms and acts as a semantic integrating system: given a gene product, a query can be
done against GO obtaining the terms that are annotated to the protein. The other terms
related via IsA or PartOf relationships to those terms can be accessed, including their
annotations, giving an overall picture of the semantic relationships of the gene product
with other biological entities and processes.
GO has a wide and active community of curators and it is very appreciated by the
molecular biologists. This is partly due to its intuitive structure and simple relationships. However this lack of formality has drawbacks: the ontology is very difficult
to maintain and it offers an over-simplified representation of the current biological
knowledge. Migrating GO to OWL DL can help solving those problems. OWL DL
gives the possibility of a more expressive modelling, closer to real biological knowledge. In expressive models, capturing biological knowledge with higher fidelity, more
complex queries can be done to them so more complex knowledge, closer to real biological knowledge, can be retrieved from Bioinformatics resources. Reasoning, which
comes from the formality of OWL DL, plays an important role in the proposed system:
reasoning can make implicit knowledge explicit (reasoning can compute the whole ontology structure from the implicit assertions) or it can be used to query the systems
in more sophisticated ways. Reasoning can also be used to maintain big ontologies
computationally with minimal human intervention.
The biologists are the ones who can do the migration from OBO to OWL DL because they are the knowledge domain experts and they can exploit the full potential of
the OWL DL expressivity. However the biologists must be presented with easy and
simple tools that help them in the task. There has already been demonstrated how the
migration of GO to OWL DL can be done in an easy manner in the Gene Ontology
Next Generation6 project (GONG). GONG’s ready-to-use methodology provides the
3
http://www.geneontology.org/
4 http://obo.sourceforge.net/
5 GO
curators use the word term to refer to classes in the ontology. When referring to GO and other
OBO ontologies in this document, the word term will be used instead of class.
6 http://gong.man.ac.uk/
9
biologists with simple semantic scissors to dissect GO: regular expressions. Another
technique is the application of Ontology Design Patterns. The notion of Ontology
Design Patterns comes from object-oriented programming, where they are known as
simply Design Patterns [GHJV95]. Design Patterns can be briefly defined as abstractions of solutions to modelling problems: when designing systems, there are constant
problems that appear again and again; a Design Pattern is a formalised way of solving
one of those problems. Instead of trying to solve the problem, the programmer can simply use or adapt a Design Pattern, which is a solution proven to be efficient many times
before, making development a faster and more reliable process. The same concept can
be applied to ontology engineering: an Ontology Design Pattern that solves a given
problem can be applied every time that the problem appears in different ontologies.
See Figure 1.2 for an Ontology Design Pattern example, called Value Partition.
The benefit of using Ontology Design Patterns is that they help to produce better
structured ontologies and ontologies that capture biological knowledge with higher fidelity. They provide the biologists with an abstraction of the underlying semantics to
easily use when creating ontologies in OWL DL, very much like a semantic swissknife. Ontology Design Patterns can be formally defined and plugged into the GONG
workflow that has been developed as part of this work to be used as an off the shelf semantic modelling tool, helping biologists in expressing complex biological knowledge
in OWL DL ontologies.
10
Figure 1.2:
Ontology Design Pattern applied to the simplified small example ontology of Figure 1.1.
This Ontology Design Pattern is called Value Partition and it is used to model classes that can
only have certain attributes. In the example’s hypothetical simplified society the occupation
can only be Teacher, Lawyer or Unemployed, so a new class is defined as being the union of
the three. This new term is not a physical part of the society, is something that describes certain
elements of the society, so it is a modifier. Using this Ontology Design Pattern the ontology
has become more expressive: there is a new condition for being considered mother or father,
and the elements of the society and their attributes have been decoupled producing a cleaner
ontology: other attributes can easily be built using other Value Partitions and added.
1.2 Research hypothesis
There is a vast amount of knowledge captured in bio-ontologies that are semantically
weak. There is a representation language (OWL DL) and associated reasoning tools
that could exploit more richly structured ontologies to expand the capabilities of bioontologies. The principal research question of this work is how to allow a biologist
migrate from the former to the latter with ease. The hypothesis of this research is as
11
follows:
A usable migration methodology and tool from OBO to OWL DL,
incorporating Ontology Design Patterns, will enable biologists to produce richer ontologies with greater analysis capabilities. This richer
representation of biological knowledge will capture the domain with
higher fidelity and facilitate analysis of data via more detailed, precise
queries.
1.3 Report outline
Chapter 2 gives an explanation of the basic concepts that relate to this research. Chapter 3 gives a review of the work done in the field, analysing the need for formalisation
of bio-ontologies and giving a detailed explanation of the GONG project. Chapter 4
gives an analysis of Ontology Design Patterns including a proposal of documentation
and classification scheme and some examples of Ontology Design Patterns applied
to bio-ontologies. Finally chapter 5 summarises the expected development of the research: the future improvements regarding Ontology Design Patterns, the application
of Ontology Design Patterns to real problems, the result evaluation criteria and the
research plan for the following two years.
12
Chapter 2
Background
This chapter explores all the neccesary background information about bio-ontologies.
It starts by explaining ontologies and the Web Ontology Language in section 2.1. In
the section 2.2 an analysis of current bio-ontologies is provided.
2.1 Ontologies
2.1.1 Introduction to ontologies
The term ontology has been borrowed in computer science from philosophy, where it
can both refer to the branch of metaphysics concerned with the nature and relations
of beings and a particular theory about the nature of being or the kinds of existents
[McG01]. In computer science it describes a more concrete construct: a model that
semantically captures the knowledge about a domain. The classical definition of an
ontology is a specification of a conceptualisation [Gru93]. A more technical definition
of an ontology considers it an engineering artifact to describe a certain reality, plus
a set of explicit assumptions regarding the intended meaning of the vocabulary words
[Gua98]. In biology, ontologies allow scientists to specify to any degree of resolution,
how data, terminology (i.e. controlled vocabularies) concepts and ideas all relate to
each other [NMW04]. The term ontology is overloaded and it is a subject of controversy: different things that range from thesauri to knowledge bases are considered to
be ontologies.
Ontologies play a pivotal role in the semantic web as vehicles for knowledge representation. They are used for different functions, such as web agents [Hen01] and
13
web services [GHS04], GRID technology [SRG03], e-commerce [Kwo03], data mining and text mining [LZ04, KOTT03] and computer security [LT04], amongst others.
It is likely that as the tools and languages to build the ontologies get more sophisticated
and robust [Mus05] new uses of ontologies will arise, unpredictable from today’s point
of view, as happened with technologies such as HTML and HTTP or databases.
2.1.2 Web Ontology Language
2.1.2.1 Introduction to Web Ontology Language
During the initial development of semantic web technologies there has been an evolution from data exchange standards like XML1 (eXtensible Markup Language) to exchange languages with more semantics like RDF2 (Resource Description Framework).
OWL (Web Ontology Language) [AvH04] is the next layer in semantic expressivity
ahead of RDF [WGA05].
OWL is a W3C3 official proposal4 for a semantic exchange language in the semantic web. The origin of OWL can be traced back to two different languages: DAML
(Darpa Agent Markup Language) and OIL (Ontology Inference Layer). DAML was
a project in the US funded by the DARPA (Defense Advanced Research Projects
Agency) that included the markup language and some tools. OIL was primarily based
in Europe, funded by the European Union’s Information Society Technologies Program. Whereas DAML was less formal and based in the notion of frames, OIL was
based in more formal DLs. The efforts converged in DAML+OIL, incorporating the
best of both, which would become OWL [HPSvH03]. OWL was designed to fulfil the
following aims [ZM03]:
• OWL ontologies should be suitable for sharing; they should be public, so that
different systems on the web can refer to the same ontology.
• OWL ontologies should be able to evolve and a given resource should be able to
point to the version of the ontology which is being used.
• OWL should allow ontologies to interoperate between each other when the same
concepts are represented in different ontologies, allowing a web of ontologies.
1 http://www.w3.org/XML/
2
http://www.w3.org/RDF/
The W3C (http://www.w3c.org) is a consortium for open web standards. It is lead by Tim
Berners-Lee, the creator of HTTP and HTML and the idea of the semantic web.
4 http://www.w3c.org/2004/OWL/
3
14
• It should be possible to detect inconsistencies between different ontologies that
are contradictory.
• OWL aims to meet a balance between expressivity and computational tractability, which leads to reasoning. The more expressive a language is, the less computationally tractable it becomes.
• OWL should be easy to use and intuitive.
• OWL should be compatible with other standards like XML or UML (Unified
Modelling Language).
• OWL should be compatible with internationalisation (use in different languages).
There are different ontology editors that can manage OWL. The one used as a platform
for this research is Protégé5 which provides a flexible plugin architecture and plenty of
different functionalities.6
2.1.2.2 Summary of Web Ontology Language technical properties
OWL ontologies fall into three different species:
OWL-Lite is the simplest type, only simple class hierarchies and simple restrictions
are allowed. OWL-Lite maps to DL.
OWL-DL maps to the DL7 S H O I N (D ). Automated reasoning can be applied to
OWL DL. It is more expressive than OWL-Lite specially with regards to class constructors. This is the type of OWL which will be the basis of this research.
OWL-Full is the most expressive OWL type, the computational tractability is not guaranteed and reasoning is not possible. OWL-Full is the union of RDF(S)8 and OWL DL.
In OWL ontologies there are three main elements [Hor04]:
1.- Individuals: the actual objects of the knowledge domain. They are analogous
5 http://protege.stanford.edu
6 http://www.co-ode.org/downloads/
7 http://dl.kr.org
8 http://www.w3.org/TR/rdf-schema/
15
to instances in frames based systems or object oriented programming.
2.- Properties: binary relations on individuals. Properties are interpreted as sets of
pairs of individuals. Properties can be of different types:
• Object properties link individuals to individuals.
• Datatype properties link individuals to values of data (integers, for example).
• Annotation properties are used to add extra information like comments from the
ontology maintainer, authors, cross references, etc.
Object properties link individuals from a certain domain to individuals of a certain
range and they can have inverse properties (the inverse of a property that links individual A to individual B will link the individual B to individual A). Object properties
can have the following characteristics:
• Functional: in a functional property, there can be at most one individual related
to a given individual.
• Inverse functional: in an inverse functional property the inverse property is functional. Thus, there can be at most one individual related by the property for a
given individual.
• Transitive: a transitive property states that if A is related to B and B is related to
C, A is related to C by the same relationship.
• Symmetric: in a symmetric property, if A is related to B then B is related to A.
Thus, the property is the inverse of itself.
3.- Classes: classes are interpreted as sets that contain individuals. The conditions for
class membership of the individuals are stated precisely using restrictions. Restrictions
are anonymous classes of individuals that have certain relationships to other individuals
of the filler class. There are different kinds of restrictions:
• Existential restrictions (∃) state that there is at least one relationship along the
restricted property to an individual of the filler class.
• Universal restrictions (∀) state that there is only one (or none) relationship along
the restricted property to an individual of the filler class, and not to other classes.
16
• Cardinality restrictions state the minimum (≥), maximum (≤) or exact (=) number of relationships along the restricted property.
The conditions (restrictions) for class membership can be neccesary (⊑) or neccesary
and sufficient (≡). The neccesary conditions assert what is needed for an individual
to be a member of a class, but they are not enough to define that membership: for example, a neccesary condition for being considered a member of the class human, is to
be a biped, but not all bipeds are humans. Neccesary and sufficient define class membership: following the given example, having a very developed neocortex is enough to
infer membership to the class human; anything that has got a very developed neocortex is human. Classes that only have neccesary conditions are called primitive classes.
Classes that have neccesary and sufficient conditions are called defined classes. Both
types of conditions can be used combined when building a class. Classes can also be
built combining other classes with logical operators like union (⊔), intersection (⊓) and
complement (¬). Logical operators can also be included in restrictions, building complex expressions. Individuals can belong to more than one class. It can be explicitly
stated that two classes are disjoint (an individual can not belong to both classes).
OWL works with an open world assumption: unless the contrary is explicitly
stated, the fact that something has not been found does not mean that it is false.
Databases, for example, work with a closed world assumption: if one item has just
one value for a given attribute, unless another value is found it will be assumed that
that item has got only that value. In an OWL ontology, for example, an individual can
belong to two different classes unless the contrary is explicitly stated by the ontologist
by making the classes disjoint.
2.2 Bio-ontologies
The mentioned transition from data to semantics that is happening in the semantic
web is a transition that it is also happening in Bioinformatics: the discipline does not
only deal with data gathering, computing tools also interpret the data and deal with the
knowledge related to those data [NMW04], following the path of the transition from
XML to RDF and to OWL. Bioinformatics is a suitable discipline for that transition
because it is a knowledge based discipline [SGB00] and plenty of biologists are willing
to annotate that knowledge. The new semantic level will bring [NMW04]:
• Integrating hetereogeneus data.
17
• Using logic to unleash new hypothesis.
• More expressive models of nature.
• Annotating discoveries formally so sharing them becomes more efficient.
Ontologies are used to reach that semantic level as they are not just controlled vocabularies: ontologies relate concepts in expressive relationships [SK02]. The aim of
ontologies in biology is to express the complex knowledge related to biology in a way
that is computationally tractable. Ontologies are widely used in the area of Bioinformatics [SWLG04] and they can be classified in respect to the function they fulfil:
Task oriented ontologies are designed for concrete tasks such as data mining and
text mining [KSK02, CY03, MBH+ 05], web services [OGA+ 05] or resources integration. In resources integration ontologies are used to integrate databases at different
levels [Jac04]; to tackle the problem of semantic heterogeneity of database entries (e.g.
Gene Ontology, see section 2.2.1.1) or database schemas (e.g. TAMBIS [SGP+ 03] or
SEMEDA [KPL03]).
Domain oriented ontologies capture the knowledge of a concrete domain of knowledge. The ontology, apart of being queried, can be used as the centre for other technologies. Plenty of OBO ontologies (see section 2.2.1) and other examples like PhosphaBase [WMS+ 05] fall in this category.
Generic ontologies are high level ontologies with general concepts that are used to
integrate different ontologies. They are also known as Upper Level Ontologies (see
section 4.3.2.3).
2.2.1 Open Biomedical Ontologies
The Open Biomedical Ontologies organisation9 (OBO) offers a platform for biomedical ontologies that satisfy the following criteria:
• OBO ontologies must be open (no restriction in use).
• OBO ontologies must be implemented in standard ways (languages like OWL).
• OBO ontologies must be orthogonal to each other (independent).
9 http://obo.sourceforge.net/
18
• OBO ontologies must have a unique identifier prefix.
• Definitions of the concepts of the OBO ontologies must be given.
2.2.1.1 The Gene Ontology
The Gene Ontology10 (GO) [Lew05, Con00, BSG+ 04] provides an ontology that describes attributes of the gene products of an abstract and non pathological eucaryotic
cell. GO offers a way of dealing with the semantic heterogeneity of gene product annotations in different databases: the annotations on different databases point to the same
GO term. The Gene Ontology is responsibility of the GO consortium, a joint project
formed by different organism databases11 that was started by FlyBase [Con99], Mouse
Genome Informatics (MGI) [Bla00] and the Saccharomyces Genome Database (SGD)
[KSR+ 04].
The main component of GO are the terms and the relationships that connect those
terms (see below). Each term has an unique identifier apart from the term name, like
GO:0005488 for the term binding.12 GO is divided in three independent ontologies:
molecular function, biological process and cellular component. Molecular function
describes basic and concrete molecular roles of gene products (e.g. thioredoxindisulfide reductase activity GO:0004791). Each biological process is made
of different molecular functions and it describes a higher level role (e.g. development
GO:0007275). The cellular component ontology represents the structure of eucaryotic
cells (e.g. organelle GO:0043226).
The whole ontology is implemented using Directed Acyclic Graphs (DAGs): multiple parent-child relationships are allowed in the structure, but cycles (a term being a
child of itself) are prohibited. The top of the hierarchy is populated by general terms
and as we move deeper (more terms in the path) the terms become more specialised.
The terms on the edge of the path are called leaves and terms in the path itself are
called nodes. There are two types of relationships in GO:
IsA: it is also known as subsumption relationship; one term subsumes the other. It
can be described as a term being a subclass of a bigger class: autosome GO:0030849
10 http://www.geneontology.org/
11 http://www.geneontology.org/GO.consortiumlist.html
12
In this document OBO terms and identifiers are put together the first time the term is introduced
(for example binding GO:0005488). In further uses of the term through the document the identifier is
left out for clarity.
19
is a subclass of chromosome GO:0005694, therefore autosome GO:0030849 IsA
chromosome GO:0005694. Officially, the IsA relationship does not mean an Instance
Of 13 [WA03]. The IsA relationship is transitive.
PartOf: this relationship means that a child is a structural component (in the cellular component ontology) or a sub-process (in the biological process ontology) of its
parent [Con01]. This relationship is also transitive.
An important assumption behind GO is the true path rule: starting from a leaf all
the relationships that go up in the tree along its path must be biologically true. Another important aspect of GO organisation is the use of the word sensu: it is used
when a term can have different meanings [LM04]. For example, the term cell wall
GO:0005618 can be used to refer to bacteria, fungi, and plants. In biology the same
word is used to refer to the three types of cell wall but the cell walls have different properties. Therefore the word sensu is added to the term, meaning in the sense
of or as described in: the term cell wall GO:0005618 has three children: cell
wall (sensu Bacteria) GO:0009274, cell wall (sensu Fungi) GO:0009277
and cell wall (sensu Magnoliophyta) GO:0009505.
GO can be explored using various tools, the most common one being the AmiGO
web interface.14 DAG-EDIT, which is a standalone program written in JAVA, is another popular tool for editing and exploring ontologies in DAGs.15 GO ontologies
can be obtained in different ways, including OBO format, flat files, XML, MySQL
tables, etc. Apart from ontologies, other resources are available. Slims are high
level slimmed down ontologies for analysing gene group annotations [Con04]. Annotations of other databases to GO are available in a list.16 The databases that include GO annotations are: SGD (Saccharomyces cerevisiae), FlyBase (Drosophila
melanogaster), TAIR (Arabidopsis thaliana), WormBase (Caenorhabditis elegans),
RGD (Rattus norvegicus), Gramene (Oryza sativa), ZFIN (Danio rerio), DictyBase
(Dictyostelium discoideum), TIGR, Sanger GeneDB, GenBank and UniProt. Every
13 As
the GO users guide says, clogs are a subclass or is-a of shoes, while the shoes I have on my feet
now are an instance of shoes.
14 http://www.godatabase.org
15 See the following web for a list of all the GO related tools, some of which are mentioned further on
in the document: http://www.geneontology.org/GO.tools.html DAG-EDIT can be downloaded
in http://sourceforge.net/projects/geneontology
16 http://www.geneontology.org/GO.current.annotations.shtml
20
GO annotation needs an evidence code17 that states where the evidence for the annotation came from (e.g. inferred from direct assay, inferred from electronic annotation,
etc.). Mappings of GO to other external systems (e.g. Enzyme Commission numbers,
SWISS-PROT keywords) are also available;18 recently GO has been mapped to the
UMLS [LM04].
The growth and success of GO has been spectacular in recent years because of its
openness, community involvement, intuitive structure and other reasons pointed to in
[BSG+ 04]. It is a very dynamic project and full-time curators include the large amount
of change requests from the community, supervised by each organisms’ database staff.
Plenty of new resources include GO annotations.19 As a consequence, its functionality
has been augmented to include, amongst others:
• The Gene Ontology Annotation project (GOA): assigns GO codes to other database
annotations [CBM+ 04, CBB+ 03].
• The Gene Ontology Annotation Tool (GOAT): closely related to GONG, this
project aims to create a tool that helps creating consistent annotation when using
GO terms [BMWS03].
• Automated [HGL03, RCSA02, XWL+ 02, GLH04, KSDS03, CLT+ 05] or integrated [JSH+ 03] gene annotation.
• Use of semantic similarity for sequence searching [Zeh03, LSBG03].
• Categorisation of gene groups [JMFH04, ZSKS04, ZFW+ 03, BS04, ASDUD04];
given a large set of genes a node or nodes on GO are used to summarise their
function [JM04].
• Categorisation of gene expression [DSD+ 03, KBBD04, VEF+ 04, LHK04, RWBB04,
JSA+ 04, YWCS05, BWG+ 04, CGGG+ 05, MBR+ 04] and statistical genomics
[Car03]. For an up-to-date detailed overview of the tools for the analysis of gene
expression based in GO see [KD05].
• Prediction of protein function by coupling machine learning with GO [LHMK03,
KFD+ 03, DTSC04], prediction of subcellular location of a given protein [CC04]
or prediction of functional modules in bacterial genomes [WSM+ 05].
17 http://www.geneontology.org/GO.evidence.html
18 http://www.geneontology.org/GO.indices.html
19 http://www.geneontology.org/GO.annotation.html
21
2.2.1.2 Other OBO ontologies
There is a growing amount of bio-ontologies in OBO. One of the most important ones
is the Cell Type ontology [BRA05], which covers procaryotyc cells, cells of animals,
plants or fungi, and either in vitro or in vivo cells.
Another OBO ontology that should be mentioned is MGED (Microarray Gene Expression Data) [SK05], which describes the data generated by microarrays. It is one of
the few OBO ontologies implemented in OWL.
2.2.2 Other biomedical ontologies outside OBO
Being OBO a relatively recent development, there are other biomedical ontologies that
were developed before OBO was established or simply there were developed outside
of OBO:
• OpenGalen20 is an ontology used for medical information management.
• BioPAX21 describes biological pathways and it is implemented in OWL.
• Ecocyc22 is one of the oldest bio-ontologies and describes the whole metabolism
of Escherichia coli.
20 http://www.opengalen.org/
21 http://www.biopax.org/
22 http://ecocyc.org/
22
Chapter 3
Formalising knowledge in
bio-ontologies: rationale and previous
work
This chapter gives the reasons for a need of formalisation in bio-ontologies and reviews
previous work regarding the mentioned problem. The reasons for a need of formalisation and a literature review is given in section 3.1. In section 3.2 the Gene Ontology
Next Generation (GONG) project and the related Biological Ontology Next Generation (BONG) Protégé plugin (developed by the author during the first year of work) are
presented. Part of the information gathered herein was collected during the author’s
visit to the EBI1 (European Bioinformatics Institute) funded by the Semantic Mining
Network of Excellence.
3.1 The need for formalised bio-ontologies: advantages
of OWL DL and problems of OBO
This whole research aims to analyse as many as possible bio-ontologies. However,
GO, the most widely used bio-ontology, is analysed almost exclusively. More bioontologies will be included in further developments, and, nonetheless, the analyses,
conclusions and Ontology Design Patterns regarding GO can be extrapolated to other
bio-ontologies.
1 http://www.ebi.ac.uk/
23
There is a clear trend in current bio-ontologies towards a more expressive and formal Knowledge Representation language: there are new biomedical ontologies implemented in OWL DL [FSP+ 04] or being transformed into OWL DL [GZB05]. In
the case of OBO ontologies there are some ontologies in OWL (MGED2 and NCI
thesaurus3 ) and other ontologies like the Sequence Ontology4 that present subrelationships and three relationship attributes (Is cyclic, Is transitive and Is symmetric).
DAG-Edit also allows the use of the properties InverseOf and DisjointFrom. The tools
[LHP03] and reasoners for OWL DL such as RACER [VR03] or FACT [IR98] are
becoming more efficient and robust, making OWL DL available for more users in the
biomedical domain [SH05].
Despite the mentioned trend towards OWL, GO still presents a very simple and
intuitive structure to the biologists: just IsA and PartOf relationships are allowed. The
rest of the expressivity needed for modelling gene products’ attributes is reached by
a mixture of curational guidelines, embedding content in the terms and other more or
less explicit work-arounds like overloading of the PartOf relationship.
Amongst the reasons for GO remaining in its current format is the reluctance of
biologists to adopt any new technology that regardless of its quality represents a big
change. This has happened in the GO consortium [Ire], where, for example, the developers have been discussing for more than a year whether the relationship regulates
should be included in the ontology. It appears as a straight forward decision from the
ontology engineering point of view, but biologist simply reject anything that it is new
and it is not absolutely evident that will work. This attitude is grounded in the fact
that bio-ontologist must provide ontologies that work [GW04] and have to be continuously up to date with the databases, so other considerations are leaved to a second
level [SK05]. Another related problem is the need to offer biologists simple interfaces
to any new, complex and expressive language like OWL DL [Har]. It has already been
pointed in the literature that the priority in the GO working group [SWSK03] and other
bio-ontology developers [SK05] is to add as much knowledge as fast as possible to the
ontology, leaving the consideration of formal principles to a second level. Thus, GO
has become a victim of its own success: its simple structure has make it the preferred
ontology for many biological databases, but its simplicity and lack of formality makes
it very hard to maintain manually. Apart of being difficult for manual maintenance,
GO has plenty of inconsistencies in the way it represents the domain knowledge and it
2 http://obo.sourceforge.net/cgi-bin/detail.cgi?mged
3 http://obo.sourceforge.net/cgi-bin/detail.cgi?ncithesaurus
4 http://obo.sourceforge.net/cgi-bin/detail.cgi?sequence
24
is not very expressive, being an opaque resource for other systems to interact with it in
a computational and more sophisticated way.
3.1.1 Integration of different ontologies
GO includes other ontologies in itself: GO terms are generally syntactically formed
by combining certain constant words [OCAM+ 04, SK04a] and a big proportion of GO
terms is made up by including terms of other ontologies. For example, all the GO terms
having some kind of cell in the term name include terms from the Cell Type ontology
(CL):
GO: fat cell differentiation (GO:0045444)
CL: fat cell (CL:0000136)
From September of 2005 there has been an ongoing effort to synchronise both ontologies; there are cells appearing in GO that do not appear in CL or they appear with a
different name and viceversa. The strategy followed to achieve the aim was to syntactically parse GO in search for CL terms, using the BONG plugin (see section 3.2)
and OBOL [Mun05]. This strategy is rather ad-hoc and does not tackle the problem
of really integrating different ontologies: semantic integration achieved by syntactic
parsing is not a scalable and sound solution. OWL DL offers a technology grounded
in the mentality of the semantic web: OWL DL ontologies can import other OWL ontologies, either locally or via HTTP, very much like importing programming libraries.
Thus the reuse of ontologies is done in a semantic level the whole time and without
having to develop parsing tools.
To achieve the possibility of OBO ontologies being efficiently reused clear upperlevel semantics must be stated first. One attempt towards that aim is the use of a set
of established relationships with well defined semantics in the OBO relationship types
ontology5 [SCK+ 05]. Another attempt is the establishment of a well defined Upper
Level Ontology (see section 4.3.2.3).
3.1.2 Automatic maintenance of multiple inheritance
GO has around 18.000 terms, and it is impossible to maintain an exhaustive structure
with the curational methods used now: the curators try to manually find any relationship that should be included with any new term with the aid of term definitions. The
5 http://obo.sourceforge.net/cgi-bin/detail.cgi?relationship
25
GO curators themselves recognised the utility of an automatic tool that could find automatically the needed relationships [Ire, Har]. If the ontology is implemented in OWL
DL this can be done using the reasoner, specially if the ontology is properly normalised
(see section 4.3.2.1).
3.1.3 Not clear semantics: lack of expressivity and formality
As a consequence of not using a formal and expressive language, plenty of semantics
are reduced to the level of syntactics and stated as editorial guidelines outside the
ontology or as parts of the term names or definitions, if stated at all. In this way
the computational tools are unable of meaningfully access the ontology and therefore
plenty of automated tasks can not be accomplished [BB05], including maintenance
[SK04a], consistency checking and new knowledge discovery [YKNA03, Ait05].
Inconsistencies in the use of sensu
The word sensu is added to a term to express as described in taxon. For example cell
wall biosynthesis (sensu Fungi) GO:0009272 means the cell wall biosynthesis understood in the way it has been described in the Fungi. This means that genes
of other taxa apart of Fungi can be annotated to cell wall biosynthesis (sensu
Fungi). But sensu is in practice understood as well as appearing in taxon, so for
example to respect the true path rule the hierarchy of taxa narrows down as the GO
hierarchy approaches its leafs, mixing both meanings of sensu. Other problems with
sensu have been pointed in [SK04a] and [SKK04] (in respect to the IsA relationship).
Proliferation of terms
As the GO language is not expressive enough, plenty of semantics must be stated
adding new terms or adding syntactic elements to already existing terms and as consequence there is an uneccesary proliferation of terms. For example there are plenty of
terms with the token during within them, to refer to processes that act within another
process, e.g. cellular morphogenesis during differentiation GO:0000904
and ethanol biosynthesis during fermentation GO:0043458 [Ire].
Different levels of granularity
The importance of levels of granularity in biomedical ontologies has already been
pointed [AJT05, GCB04] and the problems in GO derived by the mixture of different
levels of granularity in [KSN04]. There are two main problems: the different levels
26
of organisation or granularity are not explicitly stated in GO, and they are mixed. For
example the highest level of organisation in GO is the cell level, but terms that refer to
the metacellular level can be found in GO (e.g. organ development GO:0048513).
Overload of the IsA relationship
The official definition of the IsA relationship in GO states that if A IsA B, every instance
of A is an instance of B. But IsA is also used to denote KindOf leading to confusing
situations as pointed in [AWB04], [SWSK03] and [SKK04].
Overload of the PartOf relationship
There has been considerable research regarding mereology and biomedical ontologies [RR00, AWB04] as the partonomy relationship can have different semantics or
types of PartOf [Ode94]. In GO PartOf is used as a wildcard relationship when
other relationships with different semantics should be used, as addressed before by
[SDSH05, SWSK03], in the context of all the OBO ontologies by [SCK+ 05] and as
a general phenomenon in ontology engineering by [CSF03]. As PartOf holds other
relationships within, for example location, any term that it is PartOf two different GO
terms will have the same location as the ancestors of those terms along the PartOf
relationship, and that can create conflicts depending on the terms [Har]. For example polarisome GO:0000133 is part of cell cortex GO:0005938 and part of
site of polarized growth GO:0030427, so it must be deduced that polarisome
is located in both. That is partially true, because when it is located in the site of
polarized growth encloses a small portion of the whole cell cortex (see Figure
3.1). This is trivially fixed in a curator guideline that reads as follows:6
The part-of relationship used in GO is usually (...) necessarily is-part, [meaning]
that wherever the child exists, it is as part of the parent. To give a biological
example, replication fork is part of chromosome, so whenever replication fork
occurs, it is as part-of chromosome, but chromosome does not necessarily have
part replication fork
Whenever polarisome occurs, it is as part of cell cortex, but cell cortex does
not neccesarilly have part polarisome, so it could be that cell cortex has as a part
polarisome only in the site of polarized growth, allowing for proper assumptions regarding location by the human users. In any case there is not any semantical
6 http://geneontology.org/GO.usage.shtml
27
statement regarding location in the model, hence the claim that the editorial guideline
quoted is a trivial solution, because the semantic definition of PartOf remains the same,
encompassing all other relationships. If the model will be queried for the location of
polarisome it will be (wrongly) deduced to be located both in the whole of the cell cortex and in the site of polarized growth. The problem can be fixed by entering a more
specific term with the location as a child of polarisome, for example polarisome on
site of polarized growth (a technic often used in GO). However this solution is
bad for two reasons: it leads to an unnecessary proliferation of terms and still there is
no semantical way of modelling location [SH04]: it would be avoiding the problem
but not solving it (see section 4.3.3.2).
3.2 Gene Ontology Next Generation (GONG) and Biological Ontology Next Generation (BONG)
The Gene Ontology Next Generation project7 (GONG) offers a simple workflow to
migrate and improve parts of GO into OWL DL [WSGA03]. The process relies in
dissecting GO terms with regular expressions defined by the user and extracting new
semantic content that can be combined with other ontologies. Once combined with
other ontologies and translated into OWL DL, the ontology can be sent to a reasoner
and the reasoner will point new relationships that should be added back to GO. GONG
demonstrates the advantages of automatic maintenance of ontologies: a human curator
can not create and maintain all the neccesary subsumption relationships in an ontology
with more than 18.000 classes, but given that a correct set of semantics is provided, a
reasoner will.
GONG relies on how syntactically conserved the GO terms are to dissect a chosen subtree of GO in different semantic axes. For example the term acetylcholine
biosynthesis GO:0008292 belongs to two tangled subtrees: a chemical subtree leading to acetylcholine and a functional subtree leading to biosynthesis (see Figure 3.2). Dissecting the term allows for new semantics to be defined in an automatic way: acetylcholine biosynthesis can be redefined adding a restriction in
OWL DL that can be read as biosynthesis that acts on acetylcholine, maintaining the original GO relationships of the term (see Figure 3.2). If the resulting
7 http://gong.man.ac.uk/
28
Figure 3.1:
Position of the term polarisome in GO. The term polarisome is part of both cell
cortex and site of polarized growth. As a consequence of overloading PartOf, there
is a conflict regarding the location of Polarisome.
ontology is combined with a chemical ontology, the reasoner will have enough semantics to infer new relationships. For example, as acetylcholine is a subclass of
neurotransmitter in the chemical ontology, acetylcholine biosynthesis would
be inferred to be a subclass of neurotransmitter biosynthesis GO:0042136. This
process is triggered when the term is captured by the respective regular expression
29
((.+?)
(biosynthesis) in this case). Any term that matches that regular expres-
sion will held the new semantics in the resulting OWL DL ontology; each regular
expression has its own new semantics defined. Around 10 percent of the new relationships suggested by the reasoner in the last GONG execution were accepted by the GO
curators, showing the performance and utility of the workflow.8
Figure 3.2:
Functional and chemical classification in metabolism for the term acetylcholine
biosynthesis. The functional classification in the case of metabolism includes three elements: catabolism, metabolism and biosynthesis. Catabolism is included in the diagram for
clarity, although is not present in the GO subtree of the example. The chemical classification
(simplified in the diagram) is more complex, depending on the term.
Biological Ontology Next Generation9 (BONG) is a Protégé plugin that gives the
biologists a chance to use the GONG workflow with any GO subtree and any OBO
8 See
author’s MSc dissertation in http://gong.man.ac.uk/publications/
9 http://gong.man.ac.uk/downloads/
30
ontology. The BONG plugin can be used as an OBO to OWL DL converter, as a GO
(MySQL) to OWL DL converter, both, or as GONG workflow. If it is used as a GONG
workflow, the other two steps (OBO to OWL DL and GO -MySQL- to OWL DL) must
be executed first and a GONG ontology must be loaded into Protégé. The GONG ontology describes the GONG workflow and the plugin reads it to perform the workflow.
The GONG ontology is the core of the plugin, as it describes the regular expressions
that will dissect the GO terms, and the semantics related to those regular expressions.
It is neccesary to perform the GONG workflow, but it is not neccesary for the previous
steps (convert OBO to OWL DL and GO -MySQL- to OWL DL). The users can define
their own ontologies and send them to [email protected], to
put them in the central repository,10 so other users can use the new GONG workflows
without having to create new GONG ontologies.
An example of a GONG ontology is provided, bundled with the plugin, called
gong_cell_diff_cell_type.owl. It dissects and improves the GO subtree
cell differentiation using the OBO Cell Type ontology. The most important
sections of the ontology are:
• gong:Group: the subclasses of this class are used as convenience classes for
filling the restrictions described in the Regexp classes (see below). Each class
represents a group on the regular expressions. There must be as many classes
as matching groups the regular expressions will have. For example, the regexp
(.+?)
(development) has two groups.
• gong:Map: this class describes the mapping between GO sub-terms (portions of
terms) and OBO terms. For example, the GO term
brown adipocyte differentiation has brown adypocite as a portion of
the term. brown adypocite maps to the term brown_fat_cell of the OBO
Cell Type ontology (when it is transformed into OWL DL, as it is the case,
otherwise the OBO term would be brown fat cell).
• gong:Regexp: in this class the regular expressions are expressed, in the form
of gong:Regexp_n, where n are numbers, starting with 1. The most specific
regular expressions have the lowest numbers; the plugin will try to match the
most specific ones first, and if there is no match, it will try the next one. Thus,
(negative regulation of) (.+?) (development) should have a smaller
number than (.+?)
(development), as (.+?)
10 http://gong.man.ac.uk
31
(development) will catch
all the terms that were caught by the other regular expression: once a term
is caught, it is not checked for more regular expressions. The only annotation property that must be filled in the case of regular expression classes is
gong:regexp_string_value, and it is used to describe the actual regular expression (e.g. (.+?) (development)). The OWL DL conditions that the term
should have are defined using the equivalent class of the regular expression class.
Two kinds of conditions can be defined: superclasses and restrictions. The superclasses should point to an already existing class, usually on the accessory
ontologies (see below). In the restrictions, the filler is usually either an already existing class (again, probably in the accessory ontology) or a group in the
matching regular expression, thus, a portion of the matched term. For example,
if the term adipocyte differentiation is matched by the regular expression
(.+?) (differentiation) and the regular expression class restriction condition says gong:acts_on someValuesFrom Group_10, the resulting OWL DL
class will have a restriction like gong:acts_on someValuesFrom fat_cell
(the OBO Cell Type term, mapped and translated to OWL DL). In the matched
term, the matched group 1 (the first group) is adypocite, which corresponds to
fat_cell.
• gong:Accesory_Ontology: under this class is any accessory ontology that will
be used to semantically complement the GONG workflow. There is no format
requisite, but it should match the fillers or superclasses described in the Regexp
classes.
To use the plugin as a GONG workflow, the other two steps must be executed first.
The GONG ontology must be included, importing it: it can be imported either as an
URL (if the system is permanently connected: for instance the example ontology can
be imported from the project website11 ) or as a local file. After the ontology has
been imported and the GONG workflow has been executed, the ontology should be
classified with the reasoner. Some of the new subsumption relationships should make
sense as new GO IsA or PartOf relationships, so in this way a handful of hundreds of
new legitimate relationships can be automatically created with a minimum effort. If a
suitable GONG ontology is found by the user in the repository, the effort invested is
small and the user can maintain GO subtrees of around a thousand terms automatically,
easily taking advantage of the usefulness of an OWL DL approach.
11 http://gong.man.ac.uk/gong_cell_diff_cell_type.owl
32
Chapter 4
Formalising knowledge in
bio-ontologies: Ontology Design
Patterns
In this chapter the concept of Ontology Design Patterns is explained, including their
application and documentation. The section 4.1 explains the concept of Design Patterns and then explores the concept of Ontology Design Patterns. The section 4.2
explains the method that will be followed to document Ontology Design Patterns. The
Ontology Design Patterns that have been explored and that will consist the basis of this
research are presented in section 4.3.
4.1 Introduction to Ontology Design Patterns
The concept of Software Design Patterns (SDPs) comes from Object Oriented Programming [GHJV95]. There are modelling problems that rise again and again when
designing different programs. Each of the problems is common to different systems
and hence the modelling solution for each problem can be described in a generic manner, suitable for different implementations; the solution is called Design Pattern (Software Design Pattern -SDP-). Thus SDPs are very general methods of solving modelling issues that have been proven to be efficient many times and therefore become
established in an abstract form. There are anti-patterns as well: potential pitfalls that
should be avoided when designing and developing a program.
Ontology Design Patterns (ODPs) are the application of the same concept to the
creation of ontologies. Thus ODPs are modelling abstract solutions to known problems
33
in ontology engineering. Some ODPs can be found in the Semantic Web Best Practices
and Deployment Working Group web.1
ODPs improve ontological modelling in different ways:
• ODPs are abstractions. ODPs provide biologists with an easy way of dealing
with the complexity of OWL DL, making ontology creation a faster and more
reliable process. Biologists working in bio-ontologies creation prefer the complexity of the language they are using as hidden as possible [Har].
• ODPs can be made computationally explicit. ODPs allow for automatic building
of sectors of an ontology that are complex, making ontology building easier for
non-experts. The user can be guided step by step in the ODP application. For example the Protégé wizards plugin2 gives the user the possibility of automatically
creating some ODPs like Value Partitions, RDF lists and N-ary Relationships.
• ODPs provide a neat way of producing more modular and robust ontologies.
By using ODPs the entities and the structure of the ontology can be explicitly
separated [CTP04].
• The use of ODPs improves communication between ontology developers. The
developers can easily recognise the different features of the ontology produced
by the ODP, as the ODP represents a well known and easy to understand abstraction.
• ODPs produce more expressive ontologies. ODPs allow for a more fine-grained
modelling of the knowledge domain.
• By using ODPs the potential of reasoning can be exploited in more efficient
ways. The expressivity needed for efficient and productive reasoning is reached
more easily using ODPs.
Research on ODPs is very recent [Dev02], and therefore there is not a established strict
definition for ODPs, apart of the given here adapted from Object Oriented programming. In this research two ODPs are classified as ODPs when another possible definition would be best practices: Normalisation (section 4.3.2) and Upper Level Ontology
(section 4.3.2.3). They are included as ODPs because it would be rather arbitrary not
to do so: they are ontological structures, as other ODPs. The only difference is the
1 http://www.w3.org/2001/sw/BestPractices/
2 http://www.co-ode.org/downloads/wizard/co-ode-index.php
34
amplitude of their aim: Normalisation is a way of building better ontologies in its own
(rather than being a means of improving concrete parts of the ontology as other ODPs)
and Upper Level Ontology provides a way of integrating ontologies (rather than only
improving the modelling in a concrete ontology as other ODPs).
4.2 Documenting Ontology Design Patterns
There is a difference between SDPs and ODPs when modelling them: in SDPs there
is a metalanguage to describe the SDP (for example UML3) whereas in ODPs there is
not. The SDPs are described with UML in a generic manner, and then the instances
of the SDP are applied in the programming language of choice. The description of the
SDP (in UML) is different from the implemented instance (in the chosen programming
language). There is not a metalanguage for describing ODPs, and as a consequence
they are described using instances: the model, rather than being a generic structure
like in SDPs, is an instance that implicitly describes the generic structure. Another
difference is that whereas SDP models express some kind of timing (messages are
send between objects, there are phases within the SDP, etc.) the ODP models are
completely static.
4.2.1 Description template of Software Design Patterns
There is not a community-accepted guideline for documenting ODPs and no explicit
attempts have been made to solve the problem. In Object Oriented programming there
is a commonly used format for representing SDPs that usually includes the following
information sections for each SDP [GHJV95]:
Name and classification: each SDP has a unique name and the SDP is usually classified by the problem it solves. It can be classified as: fundamental Design Pattern,
creational Design Pattern, structural Design Pattern, behavioural Design Pattern, concurrency Design Pattern.
Intent: the reason for using the SDP, the problem the SDP solves.
Also known as: another name that the SDP could have.
Motivation: a possible context of the problem where the SDP can be used.
Applicability: in which situation the SDP is usable.
3 http://www.uml.org/
35
Structure: graphical representation of the SDP, usually in UML.
Participants: the elements (objects, classes, packages, interfaces) that make up the
SDP.
Collaborations: the interactions between the elements of the SDP.
Consequences: the results, consequences and trade offs of applying the SDP.
Implementation: how the SDP can be built in a real situation to solve the problem.
Sample code: the source code of a program that implements the SDP.
Known uses: real implementations of the SDP.
Related patterns: SDPs with a similar or related function.
4.2.2 Description template of Ontology Design Patterns
A similar scheme to the one used for SDPs in section 4.2.1 will be used to describe
the ODPs on this research, as there is no prior guideline. Most of the sections can be
recreated again when describing ODPs without major problems, but there are some
sections that need a deeper analysis and some sections are added:
Name and classification: the ODPs classification used for this research is based on
the general usage rather than on the problem they are intended to solve:
• Extensional ODPs: ODPs that extend the limits of OWL DL. OWL DL has got
limitations as a Knowledge Representation (KR) language. Some ODPs can be
used to overcome those limitations and present a suitable representation of the
knowledge domain that wants to be captured.
• Good practice ODPs: ODPs that are used to ensure a modelling good practice. These ODPs are used to produce more modular, efficient and maintainable
ontologies, tackling already known pitfalls of ontology engineering.
• Modelling ODPs: ODPs that are used to model a concrete part of the knowledge domain. They can be defined as signature ODPs or idioms: each knowledge
domain has got its peculiarities and these ODPs are used to model those peculiarities. For example biological knowledge differs from other domains in that the
development of things is very important, there is symmetry, there are different
level of complexity interacting with each other, there are emergent properties,
etc (see section 5.2.3.1).
36
The first two types are common to all ontologies. The ODPs of the third type are more
specific to the knowledge domain (in this case biology) but they can also be used in
other domains.
Intent: similar to Object Oriented programming SDPs.
Also known as: similar to Object Oriented programming SDPs.
Motivation: similar to Object Oriented programming SDPs.
Applicability: similar to Object Oriented programming SDPs.
Structure: in Object Oriented programming UML is used for the task. There is no
analog of UML in OWL DL; there is not a graphical representation that holds all the
semantics of the ODP on it. There are different approaches to the problem:
• The OWLViz Protégé plugin.4 It is very useful for simple class-subclass hierarchies.
• GrOWL5 (Graphical OWL). Its use is not very extended.
• Diagrams like the ones used in the W3C Best Practices web.6 Although they are
very simple, they mix property characteristics with relationships in the graph.
• OWL-ed UML diagrams. UML can be used to express OWL DL, as pointed
in [BVEL04]. Using UML has got the advantage of a graphic paradigm which
is already widely used: there are plenty of tools and big communities that are
already familiar with the format, so for example it can be easier for biologists
who already have notions of Object Oriented programming to understand ODPs
expressed in UML. There is extensive literature regarding the relation of OWL
and UML [HEC+ 04, FSS03].
• UML diagrams. UML can be used to describe very general ODPs, without considering the semantics of the target KR language, as shown in [GCB04].
The choice for this research is to use OWLViz for the general subsumption structure
of the ODP, and to use OWL-ed UML diagrams for the most important details of the
ODP, following the OWL to UML mapping described in [BVEL04], summarised in
Figure 4.1.
Participants: the list of classes or class groups used in the ODP. This section will be
called Elements instead of Participants.
4 http://www.co-ode.org/downloads/owlviz/co-ode-index.php
5 http://www.uvm.edu/˜skrivov/growl/
6 http://www.w3.org/TR/swbp-n-aryRelations/
37
Figure 4.1:
Simple mapping of OWL to UML. The OWL expressions are described in the left column
and the respective UML diagrams are described in the right column. Not all the possible OWL
expressions are included.
Collaboration: the relationships linking the classes. Only the most important ones are
described. This section will be called Relationships instead of Collaboration.
Consequences: similar to Object Oriented programming SDPs.
Implementation: similar to Object Oriented programming SDPs.
Sample code: the information of this section is presented in three different manners:
• An OWL DL ontology with the whole ODP, available via URL.
• The most important parts of the ODP described using Description Logics notation.
• The most important parts of the ODP described using the Manchester abstract
38
OWL syntax.7
Known uses: similar to Object Oriented programming SDPs.
Related ODPs: similar to Object Oriented programming SDPs.
References: this section is added to put on it possible publications or web pages were
the ODP was originated and where can it be found, for example to be imported and
included in an ontology or to be properly referenced.
Additional information: this section is added for some complementary information
that does not fit in any of the previous sections. For example, any information regarding
the origin or history of the ODP can be added here.
4.3 Ontology Design Patterns explored so far
The aim of this section is not to exhaustively explore all the ODPs that can be applied
to bio-ontologies and assess them. The aim is to give some examples of the ODPs that
will be explored during the research and how they will be described. Nonetheless some
of the ODPs already explored promise to solve important problems in the creation and
maintenance of biological-ontologies. Some of the ODPs are still very experimental; the potential implementations problems and trade-offs have not been completely
explored.
All the sections mentioned in the description template of section 4.2.2 will be maintained for consistency between ODPs; if there is no information for a given section the
word none will be used. In structure and sample code, if any of the choices is not
suitable (OWLviz, UML, DL notation or Manchester abstract syntax) it will be simply not included; for example it is redundant to use UML to model the subsumption
hierarchy in the case of Upper Level Ontology so the UML graph is left out of the
documentation.
4.3.1 Extensional ODPs
4.3.1.1 N-ary Relationships
Name and classification: N-ary Relationships, Extensional.
Intent: to model complex phenomena that have relationships linking more than one
element.
7 http://www.co-ode.org/resources/reference/manchester_syntax/
39
Also known as: Relationships of higher arity.
Motivation: the biomedical domain is full of situations were relationships should hold
between more than one element, but OWL DL only allows to express properties linking
two individuals at a time. There can be a situation where a relationship and some properties of that relationship must be modelled; that can not be done in a direct manner
with OWL DL. For example a diagnosis has a result, a probability, and the person who
has been diagnosed. A catalytic reaction has got a substrate, some products, catalytic
constants and it is catalysed by an enzyme.
Applicability: any ontology where the KR language can not link more than one individual in the same relationship. A GO example can be found in the term Golgi
to plasma membrane CFTR protein transport GO:0043000: there is a transport
phenomenon which relates to three elements at the same time: the start (Golgi), the end
(plasma membrane) and the transportee (CFTR protein). The transport relation can not
be modelled in OWL DL pointing to the three elements, so this ODP must be applied.
Elements: the original elements of the N-ary Relationship are conserved in classes
and a new class is reified to model the N-ary Relationship, in this case a class called
CFTRGolgiToPlasmaTransport.
Relationships: the relationships of each element to the reified class are created:
transports from, transports to and transports.
Structure: details of the reified class definition:
Consequences: the N-ary Relationship of the knowledge domain is explicitly stated
in the ontology.
Implementation: the only important step is to identify the new class (the reified class)
that will hold the N-ary Relationship.
Sample code:
40
• The whole ODP as an OWL DL ontology is available at:
http://gong.man.ac.uk/owl/CFTR.owl
• DL notation of the reified class definition:
CFTRGolgiToPlasmaTransport ⊑ ∃ transports to Plasma membrane
CFTRGolgiToPlasmaTransport ⊑ ∃ transports CFTRProtein
CFTRGolgiToPlasmaTransport ⊑ ∃ transports from Golgi
• Manchester abstract OWL syntax notation of the reified class definition:
class CFTRGolgiToPlasmaTransport partial
transports to SOME Plasma membrane AND
transports SOME CFTRProtein AND
transports from SOME Golgi
Known uses: none.
Related ODPs: none.
References:
• http://www.w3.org/TR/swbp-n-aryRelations
• http://gong.man.ac.uk/ontologydesignpatterns/
• http://www.co-ode.org/resources/tutorials/bio/
• See [SAW+ 05].
Additional information: none.
4.3.1.2 Exception
Name and classification: Exception, Extensional.
Intent: to model exceptions, classes that break canonical classifications.
Also known as: none.
Motivation: plenty of areas of knowledge work with defaults or canonical knowledge:
biological classifications, for example, state what is the canonical norm and then the
exceptions are classified under the norm, even if the classification is inconsistent from
the logical point of view. A clear example can be found in the classification of cells
[ABL+ 89]: in canonical biology eukaryotic cells are considered to be cells with a nucleus. Mammalian red blood cells are considered by any biologist as eukaryotic cells,
41
but they lack a nucleus. Thus they are a subclass of eukaryotic cells, but they break the
condition for belonging to that class (having a nucleus).
Applicability: any ontology that has to deal with knowledge based in canonical norms
and exceptions and is based in a KR language that does not handle exceptions directly.
OWL DL, as other DL based languages [HdCD+ 05, RWRR01], does not allow exceptions. In a cell classification ontology the class MammalianRedBloodCell (with
the restriction hasNucleus = 0) would be a subclass of EukaryotiCell (with the
restriction hasNucleus = 1), resulting in an inconsistent ontology. There can be exceptions to the exception in the next level: avian red blood cells do posses a nucleus,
thus, they are considered normal eukaryotic cells (they are an exception to the norm
that all red blood cells lack a nucleus). So the problem can rise in different levels.
Elements: the most important elements are the newly created Typical
(TypicalEukaryoticCell, TypicalRedBloodCell) and Atypical
(AtypicalEukaryoticCell, AtypicalRedBloodCell) classes. The rest of the classes
are maintained.
Relationships: the most important property is the discriminating property, in this case,
hasNucleus.
Structure:
• Subsumption hierarchy before reasoning (darker ovals are defined classes):
• Subsumption hierarchy after reasoning:
• Details of the Typical/Atypical structure:
42
Consequences: if the ODP is used in plenty of different levels of the ontology it can
produce too complex and unmanageable ontologies. This type of structure can be very
counterintuitive for biologists.
Implementation:
• Starting from the example ontology described in applicability, two disjoint
classes are created for typical and atypical elements.
• The discriminating condition (hasNucleus) is only stated in the typical subclass.
• A covering axiom is added to the main class (i.e EukaryoticCell) to state that
all instances must belong to one or the other subclass
(i.e TypicalEukaryoticCell or AtypicalEukaryoticCell). A covering axiom is done by creating a equivalent class (a neccesary and sufficient condition)
that is the union of the subclasses (In this case TypicalEukaryoticCell and
AtypicalEukaryoticCell).
• The reasoner will infer the whole structure.
Sample code:
• The whole ODP as an OWL DL ontology is available at:
http://gong.man.ac.uk/owl/eukariotic.owl
• DL notation of the Typical/Atypical structure:
43
AtypicalRedBloodCell ≡ (= 1 hasNucleus) ⊓ RedBloodCell
RedBloodCell ⊑ EukaryoticCell
RedBloodCell ⊑ TypicalRedBloodCell ⊔ AtypicalRedBloodCell
TypicalRedBloodCell ⊑ RedBloodCell
AvianRedBloodCell ⊑ = 1 hasNucleus
AvianRedBloodCell ⊑ RedBloodCell
MammalianRedBloodCell ⊑ = 0 hasNucleus
MammalianRedBloodCell ⊑ RedBloodCell
• Manchester abstract OWL syntax notation of the Typical/Atypical structure:
class AtypicalRedBloodCell complete
RedBloodCell AND hasNucleus EXACTLY 1
class RedBloodCell partial
EukaryoticCell AND TypicalRedBloodCell OR AtypicalRedBloodCell
class RedBloodCell partial
RedBloodCell
class TypicalRedBloodCell partial
RedBloodCell
class AvianRedBloodCell partial
RedBloodCell AND hasNucleus EXACTLY 1
class MammalianRedBloodCell partial
RedBloodCell AND hasNucleus EXACTLY 0
Known uses: none.
Related ODPs: none.
References:
• http://gong.man.ac.uk/ontologydesignpatterns/
• http://www.co-ode.org/resources/tutorials/bio/
• See [SAW+ 05].
Additional information: in the case of GO, it could be applied to virion GO:0019012,
which is not a cellular component GO:0005575 even if it is classified as such.
4.3.2 Good practice ODPs
4.3.2.1 Normalisation
Name and classification: Normalisation, Good Practice.
Intent: to build modular and reusable ontologies where the majority of subsumption
44
relationships are maintained by the reasoner, rather than hard-coded by the ontology
maintainer.
Also known as: Untangling, Modularisation.
Motivation: there are ontologies where a given class can have plenty of superclasses,
building a structure that is called polyhierarchy. If all those subsumption relationships
are directly stated by the ontology maintainer two main problems arise:
• The ontology becomes very difficult to maintain: whenever a subsumption must
be deleted (because a class has changed) or created (because a new class has
been created) it has to be done by hand; in a polyhierarchy the process becomes
very inefficient and error-prone.
• The semantics are implicitly stated, not explicitly: any other ontologist or reasoner only knows that a class is a subclass of its superclasses, without knowing
why.
The application example for this ODP is adapted from the Cell Type Ontology. In the
example the subsumption relationships that already are in the Cell Type Ontology are
inferred by the reasoner. The term neutrophil CL:0000096 is used as an example
class to show how a class can relate to different modules.
Applicability: any OWL DL ontology that consists of a polyhierarchy and some semantic axes can be pointed: each of those axes will be a module.
Elements: the original classes of the ontology are divided in different axes.
Relationships: the conditions for each subsumption relationship are encoded as properties that will relate the different modules.
Structure: the basis of the ODP is that each primitive class should only have a primitive parent, and primitive sibling classes should be disjoint, creating the modules.
• Subsumption hierarchy of the normalised ontology before reasoning:
45
• Subsumption hierarchy of the normalised ontology after reasoning (the polyhierarchy is built by the reasoner):
• Details of the class neutrophil:
Consequences: the ontology gets untangled and becomes a collection of neat modules.
The rest of the semantics are given by restrictions pointing to the modules.
Implementation: the implementation is done in the following steps:
• Identify the modules: group the classes.
• Create the modules, maintaining only one parent for any given primitive class
and making primitive siblings disjoint.
• Redefine the classes (or define the newly added classes) according to the conditions for belonging to each module.
46
Sample code:
• The whole ODP as an OWL DL ontology is available at:
http://gong.man.ac.uk/owl/CellType.owl
• DL notation of the class neutrophil and directly related classes:
neutrophil ⊑ ∃ has function circulation
neutrophil ⊑ ∃ has function cell motility
neutrophil ⊑ ∃ has function stuff accumulation
neutrophil ⊑ ∃ has function defense
neutrophil ⊑ animal cell
circulation ⊑ biological function
animal cell ⊑ eukaryotic cell
animal cell ⊑ ¬ plant cell
cell ⊑ biological structure
eukaryotic cell ⊑ cell
circulating cell ⊑ cell
circulating cell ≡ ∃ has function circulation
• Manchester abstract OWL syntax notation of class neutrophil and directly related classes:
class neutrophil partial
animal cell AND
has function SOME [circulation, cell motility, stuff accumulation, defense]
class circulation partial biological function
class animal cell partial eukaryotic cell AND NOT plant cell
class cell partial biological structure
class eukaryotic cell partial cell
class circulating cell partial cell
class circulating cell complete has function SOME circulation
Known uses: openGALEN.8
Related ODPs: Value Partition, Upper Level Ontology.
References:
• See [RWRR01, Rec03, SK04b, Hor04].
• http://www.w3.org/TR/owl-guide
8
http://www.opengalen.org
47
• http://www.co-ode.org/resources/tutorials/bio/
• http://gong.man.ac.uk/ontologydesignpatterns/
Additional information: Protégé has two wizards9 that facilitate the creation of this
ODP:
• The Value Partition wizard allows for creation of Value Partitions: the conditions
for class membership can be restrictions that point to the Value Partition.
• Restriction matrix: it allows for quickly creating existential restrictions in several
classes at the same time.
4.3.2.2 Value Partition
Name and classification: Value Partition, Good Practice.
Intent: to model attributes of objects that can only have certain already known values.
Also known as: Enumeration, if it is built using individuals instead of classes.
Motivation: reality is full of attributes of elements. For example, a person can be
defined as being short, medium or tall, and the attribute height can just get those values. Height is said to be covered or exhausted by those values; the possible heights are
only those three. Biology is full of such situations: metabolism can only be anabolism
or catabolism, membrane transport can only be uniport, sinport or antiport, regulation
is always positive or negative, and so forth. The example evaluated herein is the remodelling of the GO term regulation of cell killing GO:0031341 with its two
subclasses, positive regulation of cell killing GO:0031343 and negative
regulation of cell killing GO:0031342.
Applicability: any KR language that allows for covering axioms and any knowledge
domain with attributes that can only have certain values.
Elements: the main elements are the classes that make up the Value Partition itself: a
class for the attribute and the subclasses for the values. In this case, RegulationType,
positive and negative, respectively.
Relationships: the most important relationship is the one that links each element of the
knowledge domain with the values of the Value Partition. In this case, is regulation
of type.
Structure:
9 http://www.co-ode.org/downloads/wizard/index.php
48
• Subsumption hierarchy of the Value Partition and the classes that are defined using the Value Partition:
• Details of the Value Partition and the class positive regulation of cell
killing:
Consequences: the attributes and the elements that are described or modified by the
attributes get untangled: whenever a new element enters the domain (e.g. another regulation phenomenon) it is only a matter of adding a restriction pointing to the pertinent
Value Partition class. The values that can be given to a certain attribute are constrained
enforcing a better modelling.
Implementation: the implementation is done in the following steps:
• Identify the attributes every element must be described with.
• For each attribute, create a class under Modifier (or the pertinent upper level
distinction that it is used in the ontology).
49
• In each attribute class create a subclass for every value.
• Create a covering axiom defining the attribute class.
• Create the restrictions pointing to the values of the Value Partition.
Sample code:
• The whole ODP as an OWL DL ontology is available at:
http://gong.man.ac.uk/owl/regulation.owl
• DL notation of the Value Partition and the class positive regulation of
cell killing:
positive ⊑ RegulationType
RegulationType ≡ positive ⊔ negative
positive regulation of cell killing ⊑ ∃ is regulation of type positive
• Manchester abstract OWL syntax notation of the Value Partition and the class
positive regulation of cell killing:
class RegulationType complete
positive OR negative
class positive regulation of cell killing partial
is regulation of type SOME positive
Known uses: none.
Related ODPs: Value Partition is related to Normalisation and Upper Level Ontology.
In Normalisation Value Partitions can be used as fillers for the restrictions that will
be used to build the normalised modules. As Value Partitions are not elements of the
knowledge domain on their own right they are usually put under the class modifiers
(or the analogous) in an Upper Level Ontology.
References:
• http://www.w3.org/TR/swbp-specified-values
• http://www.co-ode.org/resources/tutorials/bio/
• http://gong.man.ac.uk/ontologydesignpatterns/
Additional information: the Value Partition wizard10 in Protégé allows for quick and
easy creation of several Value Partitions. The Value Partition built with classes offers
an advantage over the Enumeration (a Value Partition built with individuals): new
subpartitions can be built for each of the value classes (e.g. very tall).
10 http://www.co-ode.org/downloads/wizard/index.php
50
4.3.2.3 Upper Level Ontology
Name and classification: Upper Level Ontology, Good Practice.
Intent: to create an ontology that can integrate different ontologies in itself.
Also known as: foundational ontology.
Motivation: different ontologies of a given domain share very general types of concepts, like substance, modifier, etc. These types of concepts are grounded in philosophical criteria, like endurants and perdurants. The different domain ontologies
can thus be integrated in one Upper Level Ontology, each ontology having different relationships pointing to the concepts of the Upper Level Ontology. The Upper
Level Ontology used here as an example is the Ontology of Biomedical Reality (OBR)
[RKM+ 05].
Applicability: any KR language that supports subsumption relationships and disjoints.
Elements: all the classes are important (see Structure).
Relationships: only subsumption relationships are used.
Structure: subsumption hierarchy of OBR:
Consequences: by endorsing to a given Upper Level Ontology when building a domain ontology the ontologists makes the integration of the ontology with other ontologies a much easier process. However, the ontology is committed to a concrete view of
the domain, and therefore the use and implantation of Upper Level Ontologies is very
controversial.
Implementation: the different hierarchies of primitive classes must be asserted using
disjoints.
Sample code: the whole ODP as an OWL DL ontology is available at:
http://gong.man.ac.uk/owl/OBR.owl
Known uses: openGALEN.11
11 http://www.opengalen.org
51
Related ODPs: Normalisation, Value Partition.
References:
• See [RKM+ 05].
• http://www.co-ode.org/resources/tutorials/bio/
• http://gong.man.ac.uk/ontologydesignpatterns/
Additional information: there is extensive literature and different Upper Level Ontologies, with different properties [BGG+ 02, GSG04, RR04]. A related attempt to
unify different ontologies is the use of formalised foundational relationships [SCK+ 05,
SR04].
4.3.3 Modelling ODPs
4.3.3.1 List
Name and classification: List, Modelling.
Intent: to model ordered groups of elements.
Also known as: Linked List.
Motivation: an ordered group of elements is a very intuitive modelling structure, yet
the semantics of such a construct in OWL DL are complex. Biology is full of structures where the order of the elements is vital, either in time (e.g. phases of processes)
or space (e.g. parts of genes). If that order is altered (e.g. a change of the order of
introns and exons in a gene) there can be serious damage in Biological systems. In
this case the ODP will be used to build a gene starting from some elements of the
Sequence Ontology [KSC+ 05]: promoter SO:0000167, terminator SO:0000141,
intron SO:0000188 and exon SO:0000147. For the sake of clarity a minimalist
gene is built, with a very simple structure.
Applicability: any KR language that allows the use of subproperties, functional properties, transitive properties, intersections and unions.
Elements: the most important elements are the different classes that can be used to
build the List (promoter, terminator, intron and exon) and the class that it is modelled using the List (in this case gene).
Relationships: the needed relationships are: contents (functional), rest (transitive)
and next (functional and a subproperty of rest).
Structure: details of the gene class (a list formed in the following order: Promoter,
52
Exon, Intron, Exon, Terminator):
Consequences: if very long and complex lists are used there can be a decrease in
reasoning performance.
Implementation: there is a Protégé wizard for creating lists.
Sample code:
53
• The whole ODP as an OWL DL ontology is available at:
http://gong.man.ac.uk/owl/Genes.owl
• DL notation of the Gene and EmptyList classes:
Gene ⊑ GeneStructure
Gene ⊑ ∃ contents Promoter
Gene ⊑ ∃ next (GeneStructure ⊓ (∃ contents Exon) ⊓ (∃ next
(GeneStructure ⊓ (∃ contents Intron) ⊓ (∃ next
(GeneStructure ⊓ (∃ contents Exon) ⊓ (∃ next
(GeneStructure ⊓ (∃ contents Terminator) ⊓ (∃ next
(GeneStructure ⊓ (∃ contents EmptyList))))))))))
EmptyList ≡ (≤ 0 next) ⊓ (≤ 0 contents) ⊓ GeneStructure
• Manchester abstract OWL syntax notation of Gene and EmptyList classes:
class Gene partial
GeneStructure AND
contents SOME Promoter AND next
SOME (GeneStructure AND (contents
SOME (GeneStructure AND (contents
SOME (GeneStructure AND (contents
SOME (GeneStructure AND (contents
SOME (GeneStructure AND (contents
SOME
SOME
SOME
SOME
SOME
Exon) AND (next
Intron AND (next
Exon) AND (next
Terminator) AND (next
EmptyList))))))))))
class EmptyList complete
GeneStructure AND
next MAX 0 AND
contents MAX 0
Known uses: experimental modelling of protein Fingerprints [DMS].
Related ODPs: none.
References:
• http://www.co-ode.org/resources/tutorials/bio/
• http://gong.man.ac.uk/ontologydesignpatterns/
Additional information:
the Linked List is one of the oldest and most widely used
data structures in computer science;12 plenty of programming languages offer primitives similar to it. The Circularly Linked List is a List that ends up with the beggining
12 http://en.wikipedia.org/wiki/Linked_list
54
of itself, creating a circle. The application of the circularly Linked List in OWL DL
has not been investigated yet.
Apart of being an efficient way of modelling ordered elements, Lists offer the
possibility of creating a powerful classyfing system: Lists of plenty of kinds can be
defined (e.g. definitions of the type any List containing elements A and B,
not followed by C and then followed by two D-s.) and they will be put in
the correct position of the hierarchy of already defined lists. Using that procedure, for
example, different protein fingerprints (lists of regular expressions) or different kinds
of genes can be defined. The models can be queried, for example, with a given gene
defined with a certain ordered combination of introns, exons, promoter and terminator
to see in which position of the hierarchy is classified and to which genes does it relate.13 For example, a query of the type Any gene with two successive exons would be
written in DL notation as follows:
AnyGeneSuccesiveExons ≡ GeneStructure ⊓ (((∃ contents Exon)
⊓ (∃ next (GeneStructure ⊓ (∃ contents Exon))))
⊔ (∃ rest (GeneStructure ⊓ (∃ contents Exon)
⊓ (∃ next (GeneStructure ⊓ (∃ contents Exon))))))
4.3.3.2 Adapted SEP triples
Name and classification: Adapted SEP triples, Modelling.
Intent: propagation of properties along the partonomy relation.
Also known as: Propagator.
Motivation: in the biomedical domain the propagation of properties along the partonomy relation is very important. For example, there are cases where the fault of the part
should be assumed to be a fault of the whole (an appendix perforation is an intestine
perforation) and other cases where it should not be like that (appendicitis is not enteritis). The problem of propagating properties along partonomy relates directly to the
problem of overloading part of in GO: for example location, a property that should
propagate (or not) with part of, is always implicitly present anywhere there is a part
of relation. As explained in section 3.1.3, polarisome is part of cell cortex and
part of site of polarized growth, inheriting both locations, creating a conflict:
polarisome is not located in the whole of the cell cortex, is only located in the cell
cortex in the site of polarised growth. This ODP gives an example of how to solve that
problem, using a technic originally described in [SR05].
13
In OWL, defined classes can be seen as queries that they are done against the ontology; once
classified, the subclasses of the defined classed would be the answers to the query.
55
Applicability: any KR language with transitive properties and a knowledge domain
with the need for propagation along transitive properties. OWL DL does not have
an explicit idiom for that requirement, like the propagates via construct of GRAIL
[RR04]. However the same effect can be achieved using another structure.
Elements: the elements of the partonomy hierarchy are maintained and in this case
two new elements are added to represent concrete locations in the cell (cellular
location pole and cellular location periphery).
Relationships: the partOf relationship is maintained (defined as transitive) and in this
case a new property is added to link locations with cellular components,
cellularLocationOf.
Structure: detailed outline of all the classes of the ODP and their relationships:
Consequences: the location property cellularLocationOf is propagated along partOf
in a selective way, allowing for a precise and unambiguous definition of the polarisome
location.
Implementation: the most important step is to define the class cellular location
pole of growth as the location of site of polarized growth or any of its parts,
so the location is propagated to the parts (but it is not propagated in the case of cell
cortex).
Sample code:
• The whole ODP as an OWL DL ontology is available at:
http://gong.man.ac.uk/owl/Polarisome.owl
56
• DL notation of the whole ODP:
cellular location pole of growth ⊑ ∃ cellularLocationOf
(site of polarized growth ⊔ (∃ partOf site of polarized growth))
polarisome ⊑ ∃ partOf cell cortex
polarisome ⊑ ∃ partOf site of polarized growth
cellular location periphery ⊑ ∃ cellularLocationOf cell cortex
• Manchester abstract OWL syntax notation of the whole ODP:
class cellular location pole of growth partial
cellularLocationOf SOME (site of polarized growth OR
(partOf SOME site of polarized growth))
class polarisome partial
partOf SOME [cell cortex, site of polarized growth]
class cellular location periphery partial
cellularLocationOf SOME cell cortex
Known uses: none.
Related ODPs: none.
References:
• See [SR05].
• http://www.w3.org/2001/sw/BestPractices/OEP/SimplePartWhole/
• http://gong.man.ac.uk/ontologydesignpatterns/
Additional information: The ODP can be checked by creating the following two
classes:
PolarisomeLocation ⊑ ∃ cellularLocationOf polarisome
SiteOfPolarisedGrowthLocation ≡ ∃ cellularLocationOf
(site of polarized growth ⊔ (∃ partOf site of polarized growth))
After reasoning PolarisomeLocation should be a subclass of
SiteOfPolarisedGrowthLocation.
There have been different proposal in the literature for modelling transitive propagation in the biomedical domain. The approach chosen for this ODP [SR05, Rec02]
relies on the possibility of creating transitive properties given by OWL DL. Another
57
approach is the one described in [SRH98, SH05], which relies in simulating the transitivity by creating SEP triples (Structure - Entity - Part) for each class of the partonomy
hierarchy, allowing for selective inheritance of properties.
This ODP can also be applied to the problem of sensu in GO described in section
3.1.3. The property sensu can be decoupled in two properties, described in (the official
definition of sensu) and appearing in (to point to the taxon where the entity appears). It
can be applied to partonomy hierarchies of GO: appearing in should propagate along
part of (the part of the whole should appear in the same taxon or subtaxon of the
whole) and described in should not propagate, as the description taxon of the part does
not have any relationship with the description taxon of the whole.
58
Chapter 5
Conclusion
This chapter explains the future developments and contributions of the research. Section 5.1 explains in more detail the research hypothesis relating it to future developments. Section 5.2 explores the contributions that will come up in the research. Section 5.3 describes the criteria that will be used to evaluate the result of the research
and finally the section 5.4 gives an overview of how the work will be organised in the
following two years.
5.1 Research hypothesis revisited and extended: research
aims, objectives and questions
The most extended paradigm in ontology creation and maintenance in Bioinformatics
is OBO (Open Biomedical Ontologies). OBO ontologies give a low fidelity representation of biological complexity because the language they are implemented on is very
simple. OBO ontologies do not rely in any formalism and thus they are not amenable
of automated treatments such as reasoning and advanced querying. The hypothesis
of this research is that by migrating the actual (OBO) biological ontologies to a more
expressive and formal paradigm like OWL DL will allow for a higher fidelity biological knowledge representation. This representation will provide more sophisticated
querying, more efficient interaction and easier maintenance of the ontologies. Once
the semantic expressivity is reached, the basis for new resources is set up. The aim
of this research is to develop modelling technics (mainly Ontology Design Patterns
-ODPs-), tools and user interfaces that will help biologists in that modelling and migration, specially when confronted to more expressive and formal languages like OWL
59
DL. Therefore the main objectives of the research are to develop:
• A precise, formal and understandable description of ODPs.
• An user friendly framework for creating and migrating biological ontologies into
OWL DL, including the application of ODPs.
• Examples of application of ODPs in real bio-ontologies.
From this basis some research questions can be summarised:
• What are the properties of OWL DL over the already existing OBO paradigm
that makes it a better Knowledge Representation technology, from the point of
view of a biologist?
• Which user interfaces can be built to help biologists dealing with the migration
to OWL DL from the OBO ontologies or creating OWL DL ontologies?
• What is the formal definition of an ODP? How can an ODP be documented and
easily explained? How can an ODP be implemented?
• Which particularities of biological knowledge make it suitable for application of
ODPs? How can an ODP application target be spotted in biological knowledge?
5.2 Contributions
This section describes the expected contributions to the field that this research will
yield. Some of them are already implemented, the majority will be implemented in the
following two years.
5.2.1 GONG and BONG
It has already been shown in the Gene Ontology Next Generation1 (GONG) workflow
that by offering simple migration tools to biologists new semantics can be added to
pre-existing OBO ontologies and interesting results can be obtained. This is shown by
the fact that plenty of new relationships proposed in the last execution of the GONG
1 http://gong.man.ac.uk/
60
workflow were accepted by the GO curators.2 The Biological Ontology Next Generation3 (BONG) Protégé plugin allows the biologist to define a GONG workflow in
a simple ontology, to be executed by the BONG plugin. This means that a biologist
can obtain a more sophisticated knowledge representation in OWL DL, with all its advantages by just defining a simple GONG ontology (a simple ontology that describes
the GONG workflow and it is read by the BONG plugin in order to execute a GONG
workflow).
The BONG plugin is useful not just as an incarnation of the GONG workflow: it is
a general OBO to OWL converter, or it can even be used to just dissect OBO ontologies
and find relations to other ontologies with regular expressions. This is demonstrated
by the work carried out by the author in the EBI collaborating with Chris Mungall to
create GO relationships including Cell Type ontology terms.
During the future research the BONG plugin will be improved, adding, for example, easier regular expression creation, GONG ontology automatic generation and
better results retrieval (the plugin should only point to new relationships to be added
to GO, filtering the non-informative results from the reasoner). A GONG ontologies
repository will be created in the GONG web site so biologists can go to the repository,
grab an already defined workflow and execute it, saving time.
5.2.2 Integration of ODPs in BONG
The BONG plugin offers the appropriate platform for providing the biologists the possibility of applying ODPs to actual bio-ontologies from OBO. The ODPs can be directly asserted when defining the semantics of the workflow in the GONG ontology.
There is also the possibility of implementing ODPs as part of Protégé itself, as it is
already the case for some of ODPs which are in wizard form.4
5.2.3 ODPs catalog
A catalog of ODPs will be created during the research, with the ODPs already explored
and with more ODPs that will come up, available online in the GONG project web
page.5
2 See
author’s MSc dissertation in http://gong.man.ac.uk/publications/
3 http://gong.man.ac.uk/downloads/
4 http://www.co-ode.org/downloads/wizard/index.php
5 http://gong.man.ac.uk/ontologydesignpatterns/
61
5.2.3.1 Properties of the biological knowledge domain
There are certain properties of the biological domain that can be exploited in order to
discover new ODPs or that they can represent a challenge when developing ODPs:
• Time plays an important role in biology, in processes like development and evolution.
• In biology the origin of the biological beings is a complex concept, for example
in the case of development where a given structure can be transformed in plenty
of different structures and different structures can converge in one structure.
• Biology is full of complex dynamics, like physiological or metabolic regulation,
population genetics, etc.
• Symmetry: in processes like catalytic activity there is always the forward and
reverse reaction [Shr03], in metabolism catabolism is always in presence of
biosynthesis, etc. There is also structural symmetry, for example radial, pentaradial or bilateral symmetry in anatomy.
• In biology, contrary to the medical domain, there is a high diversity of structural
organisations. For example the arthropods anatomy is completely different to the
vertebrates anatomy, and even more different to the structure of plants or fungi.
Each group of organisms presents important differences and idioms in the way
they are structurally described by the biologists.
• Information order: there are structures were the information in ordered manner
is very important, like parts of a gene (at sequence level or other levels like
exon/intron), the order of aminoacids in a protein, the order of events in plenty
of processes (neuron activation, gene transcription), etc.
• Complex interactions in metabolism, at molecular level (for example the macromolecular complex DNA polymerase III) and other levels. There are interactions
between elements of different levels.
• Taxonomical classifications and nomenclature, which relate in complex ways
to evolution via cladistics (paraphyletc and polyphiletic taxa, the difficulty on
defining what a species is, etc.).
• Biology is an experimental science, were evidence tracking, methodology concepts and quantitative data are very important.
62
• Biological reality is highly fuzzy, non-deterministic and full of uncertainty [BB05].
• In biology different levels of organisation or granularity coexist interacting between each other [AJT05, KSN04].
5.2.3.2 Ontological constructs for ODPs
There are ontological constructs that should be explored in order to build biological
ODPs, like QCR (qualified cardinality), GCI (general concept inclusion), and more.
In the other hand, rules in the form of the Semantic Web Rule Language6 represent an extension in expressivity for OWL DL, giving the possibility of asserting, for
example, relationships between properties [hPSBT05]. Rules have already been used
in modelling biomedical knowledge [GBGD05]. However the use of rules can lead to
the undecidability of the resulting ontologies. This problem can be avoided by using
DL-safe rules [MSS05] and reasoners such as KAON2.7 Rules are yet being explored
but they represent a promising area, as they can be used to implement more expressive
ODPs.
5.2.4 Documenting ODPs
The already described documenting scheme is a contribution in its own right but it has
to be seen how potential audience (biologist applying ODPs to biological ontologies)
uses it and, nonetheless, there are already important points to be considered:
Other classifications of ODPs
Another possible classification of ODPs is based on whether the ODP relies on reasoning or not (ODPs where reasoning is not completely neccesary, although recommendable for maintenance and query building):
• ODPs based on reasoning: Normalisation, Exception.
• ODPs not based on reasoning: N-ary relationships, List, Adapted SEP triples,
Value Partition, Upper Level Ontology.
Other classifications should come up during the research.
6 http://www.w3.org/Submission/2004/SUBM-SWRL-20040521/
7 http://kaon2.semanticweb.org/
63
Compositionality of ODPs
SDPs are sometimes built combining different SDPs. In the case of the reviewed ODPs
Normalisation can be considered a composed ODP, as it is built using Value Partitions.
Other types of compositionality in ODPs will be explored during the research.
Ontology metamodels
In the current proposal ODPs are described as instances. The ODPs should be expressed using a metamodel: a model that is capable of describing the ODP in an
abstract way. In other words, a metamodel capable of expressing an ontology in an
abstract level is needed. The model would fill the same functionality as UML in the
case of Object Oriented programming, where the different implementations in each
programming language are incarnations of the abstract model in UML. This would
have the following advantages:
• A formal way of assessing the correctness of each ODP implementation.
• More clear descriptions of ODPs, amenable of more efficient sharing of the
ODPs between the bio-ontologists. If the explanations are not based on instances, all the possible confusions that arise from the particularities of the example are avoided. Nonetheless concrete examples should still be used to explain
ODPs, but not as the main expression of the ODP.
5.2.5 Improved bio-ontologies
The Arabidopsis thaliana life cycle ontology in OWL DL is going to be built in the
Plant Systems Biology division of the Ghent University.8 The author will collaborate
in the process as a Marie Curie visitor. The whole ontology has to be built from scratch;
plenty of ODPs can be tested on it without the constraints of an already existing ontology, as it happens in the Gene Ontology and other OBO ontologies.
The author is regularly involved in preparing materials for OWL DL tutorials given
to biologists in Manchester university, where the ODPs and their application can be
(and have already been) explained to the attendees, who are mainly biologists. After
the tutorials they can try to apply the ODPs in their respective knowledge domains.
The tutorials are a good ODPs testing activity because the attendees show the strong
points and weaknesses of the explained ODPs, and it can be assessed whether they
understand them or not and why.
8 http://www.psb.ugent.be/
64
5.3 Evaluation
The outcome of this research will be the ontologies mentioned in the previous section.
Evaluating the quality of an ontology is still a new research area and a matter of controversy. There are different proposed methods for ontology evaluation, differing in
the area they focus on:
• Methods based in how ontologies perform in task oriented environments [HSG+ 05].
• Methods based in structural validity [HSG+ 05].
• Methods based in sound philosophical principles like Ontoclean [GW02].
None of the mentioned methodologies completely fits the aim of this research, because biological ontologies are not usually task-oriented and because structural validity
does not mean suitability of the knowledge representation, so the following criteria will
be used to evaluate the ontologies created:
• Functionality of the ontology: by the expressiveness of OWL DL and application
of reasoning new functionalities can be explored in the developed ontologies.
• Expressiveness of the ontology: how does the ontology map to the domain of
knowledge. This can be done by creating queries against the ontology that reflect
the needs of the domain experts.
• Acceptance of the new ontologies by the community of the domain of knowledge. There is a prior example: the results of the GONG workflow were accepted
and incorporated to the Gene Ontology in 2004.9
• Logical correctness and reusability [RWRR01]: how modular the ontology is,
how does it interact with other ontologies. Ontoclean can be used as part of this
criterium.
5.4 Research plan
The research plan for the following two years is described in the two charts of the
Figure 5.1.
9 See
author’s MSc dissertation in http://gong.man.ac.uk/publications/
65
Figure 5.1:
Research plan. The research plan is divided in months for each year. The tasks are depicted
in the left column.
66
Bibliography
[ABL+ 89]
B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J.D. Watson.
Molecular Biology of the Cell. Garland, New York, 1989.
[Ait05]
Stuart Aitken. Formalizing concepts of species, sex and developmental stage in anatomical ontologies. Bioinformatics, 21(11):2773–2779,
2005.
[AJT05]
Rector A.L., Rogers J.E., and Bittner T. Granularity Scale and Collectivity: When Size Does and Doesn’t Matter. Journal of Biomedical
informatics (in press), 2005.
[ASDUD04] Fátima Al-Shahrour, Ramón Dı́az-Uriarte, and Joaquı́n Dopazo.
FatiGO: a web tool for finding significant associations of Gene Ontology
terms with groups of genes. Bioinformatics, 20(4):578–580, 2004.
[AvH04]
Grigoris Antoniou and Frank van Harmelen. Handbook on ontologies
(International Handbooks on Information Systems), chapter 4. Springer,
2004.
[AWB04]
J.S. Aitken, B.L. Webber, and J.B.L. Bard. Part-of Relations in Anatomy
Ontologies: a Proposal for RDFS and OWL Formalisations. In Proc.
PSB, pages 166–177, 2004.
[BB05]
Richard Baldock and Albert Burger. Anatomical ontologies: names and
places in biology. Genome Biology, (6):108, 2005.
[BGG+ 02]
Stefano Borgo, Aldo Gangemi, Nicola Guarino, Claudio Masolo, and
Alessandro Oltramari. Ontology RoadMap. Wonder Web deliverable
15, 2002.
67
[Bla00]
J.A. Blake. The mouse genome database (MGD): expanding genetic and
genomic resources for the laboratory mouse. Nucleic Acid Research,
28:108–111, 2000.
[BLHL01]
Tim Berners-Lee, James Hendler, and Ora Lassila. The Semantic Web.
Scientific American, MAY 2001.
[BMM05]
O. Bodenreider, J.A. Mitchell, and A.T. Mccray. Biomedical Ontologies. In PSB, 2005.
[BMWS03]
M. Bada, R. McEntire, C. Wroe, and R. Stevens. GOAT: The Gene
Ontology Annotation Tool. In Proceedings of the 2003 UK e-Science
All Hands Meeting, pages 514–519, Nottingham, UK, 2003.
[BRA05]
Jonathan Bard, Seung Y Rhee, and Michael Ashburner. An Ontology
for Cell Types. Genome Biology, 6:R:21, 2005.
[BS04]
Tim BeißBarth and Terence P. Speed. GOstat: find statistically overrepresented gene ontologies within a group of genes. Bioinformatics,
20(9):1464–1465, 2004.
[BSG+ 04]
Michael Bada, Robert Stevens, Carole Goble, Yolanda Gil, Michael
Ashburner, Judith A. Blake, J. Michael Jerry, Midori Harris, and
Suzanna Lewis. A short study on the success of the Gene Ontology.
Journal of Web Semantics, 1:235–240, 2004.
[BVEL04]
Sara Brockmans, Raphael Volz, Andreas Eberhart, and Peter Löffler.
Visual Modelling of OWL DL Ontologies using UML. In Proc. ISWC,
pages 198–213, 2004.
[BWG+ 04]
Elisabeth L. Boyle, Shuai Weng, Jeremy Gollub, Heng Jin, David Botstein, J. Michael Cherry, and Gavin Sherlock. GO::TermFinder – Open
source software for accessing Gene Ontology information and finding
significantly enriched Gene Ontology terms associated with a list of
genes. Bioinformatics, 20(18):3710–3715, 2004.
[Car03]
Vincent J. Carey. Ontology concepts and tools for statistical genomics.
Journal of Multivariate Analysis, 90:213–228, 2003.
68
[CBB+ 03]
Evelyn Camon, Daniel Barrell, Catherine Brooksbank, Michele Magrane, and Rolf Apweiler. The gene ontology annotation (GOA) project
- application of GO in SWISS-PROT, TrEMBL, and InterPro. Comparative and Functional Genomics, 4:71–74, 2003.
[CBM+ 04]
Evelyn Camon, Daniel Barrell, Michele Magrane, Rolf Apweiler, Vivian Lee, Emily Dimmer, John Maslen, David Binns, Nicola Harte, and
Rodrigo Lopez. The gene ontology annotation (GOA) database: sharing knowledge in uniprot with gene ontology. Nucleic Acid Research,
32:D262–D266, 2004.
[CC04]
Kuo-Chen Chou and Yu-Dong Cai. Prediction of protein subcellular
locations by GO-FunD-PseAA predictor. Biochemical and Biophysical
Research Communications, 320:1236–1239, 2004.
[CGGG+ 05] Ana Conesa, Stefan Götz, Juan Miguel Garcı́a-Gómez, Javier Terol,
Manuel Talón, and Montserrat Robles. Blast2GO: a universal tool for
annotation, visualization and analysis in functional genomics research.
Bioinformatics, 21(18):3674–3676, 2005.
[CLT+ 05]
F. Chalmel, A. Lardenois, J.D. Thompson, J. Muller, J.A. Sahel, and
T. Léveillard. GOAnno: GO annotation based on multiple alignment.
Bioinformatics, 21(9):1095–2096, 2005.
[Con99]
The FlyBase Consortium. The FlyBase database of the drosophila
genome projects and community literature. Nucleic Acid Research,
27:85–88, 1999.
[Con00]
The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genetics, 23(May):25–29, 2000.
[Con01]
The Gene Ontology Consortium. Creating the Gene Ontology Resource:
Design and Implementation. Genome Research, 11:1425–1433, 2001.
[Con04]
The Gene Ontology Consortium. The Gene Ontology (GO) database
and informatics resource. Nucleic Acids Research, 32:D258–D261,
2004.
69
[CSF03]
Werner Ceusters, Barry Smith, and Jim Flanagan. Ontology and Medical Terminology: Why Description Logics Are Not Enough. In Towards
and Electronic Patient Record, 2003.
[CTP04]
Peter Clark, John Thompson, and Bruce Porter. Handbook on ontologies (International Handbooks on Information Systems), chapter 32.
Springer, 2004.
[CY03]
Jung-Hsien Chiang and Hsu-Chun Yu. MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics, 19:1417–1422, 2003.
[DBD+ 04]
E. Demir, O. Babur, U. Dogrusoz, A. Gursoy, A. Ayaz, G. Gulesir,
G. Nisanci, and R. Cetin-Atalay. An Ontology for Collaborative Construction and Analysis of Cellular Pathways. Bioinformatics, 20:349–
356, 2004.
[Dev02]
Vladan Devedzic. Understanding Ontological Engineering. Communications of the Association for Computing Machinery, 45(4):136–144,
2002.
[DMS]
Nick Drummond, Georgina Moulton, and Robert Stevens. Personal
communication.
[DSD+ 03]
Scott W Doniger, Nathan Salomonis, Kam D Dahlquist, Karen
Vranizan, Steven C Lawlor, and Bruce R Conklin. MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression
profile from microarray data. Genome Biology, 4, 2003.
[DTSC04]
Minghua Deng, Zhidong Tu, Fengzhu Sun, and Ting Chen. Mapping
Gene Ontology to proteins based on protein-protein interaction data.
Bioinformatics, 20:895–902, 2004.
[FSP+ 04]
Keith Flanagan, Robert Stevens, Matthew Pocock, Pete Lee, and Anil
Wipat. Ontology for genome comparison and genomic rearrangements.
Comparative and Functional Genomics, 5:537–544, 2004.
[FSS03]
K. Falkovych, M. Sabou, and H. Stuckenschmidt. Knowledge Transformation for the Semantic Web. IOS Press, 2003.
70
[Gal05]
Michael Y. Galperin. The Molecular Biology Database Collection: 2005
update. Nucleic Acids Research, 33(Database issue):D5–D24, 2005.
[GBGD05]
C. Golbreich, O. Bierlaire, B. Gibaud, and O. Dameron. What Reasoning Support for Ontology and Rules? the Brain Anatomy Case Study.
In 8th International Protégé Conference, July 2005.
[GCB04]
Aldo Gangemi, Carola Catenacci, and Massimo Battaglia. Inflammation
Ontology Design Pattern: an exercise in building a core Biomedical Ontology with Descriptions and Situations. Stud. Health Technol. Inform.,
102:64–80, 2004.
[GHJV95]
Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Professional Computing Series. Addison-Wesley, 1995.
[GHS04]
Nicholas Gibbins, Stephen Harris, and Nigel Shadbolt. Agent-based
semantic web services. Journal of Web Semantics, 1(1):141–154, 2004.
[GLH04]
Detlef Groth, Hans Lehrach, and Steffen Hennig. GOblet: a platform
for Gene Ontology annotation of anonymous sequence data. Nucleic
Acids Research, 32:W313–W317, 2004.
[Gru93]
T.R. Gruber. A translation approach to portable ontologies. Knowledge
Acquisition, 5:199–220, 1993.
[GSG04]
Pierre Grenon, Barry Smith, and Louis Goldberg. Ontologies in
Medicine, chapter Biodynamic Ontology: Apllying BFO in the Biomedical Domain. IOS Press, 2004.
[Gua98]
N. Guarino. Formal Ontology and Information Systems. In Formal
Ontology and Information Systems. IOS Press, 1998.
[GW02]
Nicola Guarino and Christopher Welty. Evaluating Ontological Decisions with Ontoclean. Communications of the ACM, 45(2):61–65, 2002.
[GW04]
CA Goble and CJ Wroe. The Montagues and the Capulets. Comparative
and Functional Genomics, 2:623–632, 2004.
71
[GZB05]
Christine Golbreich, Songmao Zhang, and Olivier Bodenreider. The
Foundational Model of Anatomy in OWL: experience and perspectives.
In Proc. AMIA symp, 2005.
[Har]
Midori Harris. Personal communication.
[HdCD+ 05] Frank W. Hartel, Sherri de Coronado, Robert Dionne, Gilberto Fragoso,
and Jeniffer Golbeck. Modeling a Description Logic Vocabulary for
Cancer Research. Journal of Biomedical Informatics, (38):114–129,
2005.
[HEC+ 04]
L. Hart, P. Emery, B. Colomb, K. Raymond, S. Taraporewalla,
D. Chang, Y. Ye, E. Kendall, and M. Dutra. OWL Full and UML 2.0
Compared. http://www.omg.org/docs/ontology/04-03-01.pdf, 2004.
[Hen01]
James Hendler. Agents and the semantic web. IEEE Intelligent Systems
Journal, 16:30–37, 2001.
[HGL03]
Steffen Hennig, Detlef Groth, and Hans Lehrach. Automated Gene Ontology annotation for anonymous sequence data. Nucleic Acid Research,
31:3712–3715, 2003.
[Hor04]
Mathew Horridge.
A practical guide to building OWL ontologies with the Protégé-OWL plugin. http://www.co-ode.org/
resources/tutorials/ProtegeOWLTutorial.pdf, 2004.
[hPSBT05]
Ian horrocks, Peter F. Patel-Schneider, Sean Bechoffer, and Dmitry
Tsarkov. OWL Rules: a Proposal and Prototype Implementation. Journal of web semantics, (3):23–40, 2005.
[HPSvH03]
Ian Horrocks, Peter F. Patel-Schneider, and Frank van Harmelen. From
SHIQ and RDF to OWL: the making of a web ontology language. Web
Semantics: Science, Services and Agents on the World Wide Web, 1:7–
26, 2003.
[HSG+ 05]
Jens Hartman, Peter Spyns, Alain Gibon, Diana Maynard, Roberta Cuel,
Mari Carmen Suárez-Figueroa, and York Sure. Methods for Ontology
Evaluation. Knowledge Web deliverable 1.2.3/v1.3, 2005.
72
[IR98]
Horrocks IR. The FACT system. In Proceedings of the international
conference TABLEAUX, pages 307–312. Springer, 1998.
[Ire]
Amelia Ireland. Personal communication.
[Jac04]
Jacob Köhler. Integration of Life Science Databases. Biosilico, 2(2):61–
69, 2004.
[JM04]
Cliff Joslyn and Susan Mniszewski. Combinatorial Approaches to BioOntology Management with Large Partially Ordered Sets. In SIAM
Workshop on Combinatorial Scientific Computing (CSC 04), February
2004.
[JMFH04]
Cliff A. Joslyn, Susan M. Mniszewski, Andy Fulmer, and Gary Heaton.
The Gene Ontology Categorizer. Bioinformatics, 20:i169–i177, 2004.
[JSA+ 04]
Cheng J, Sun S, Tracy A, Hubbell E, Morris J, Valmeekam V, Kimbrough A, Cline MS, Liu G, Shigeta R, Kulp D, and Siani-Rose MA.
NetAffx Gene Ontology Mining Tool: a visual approach for microarray
data analysis. Bioinformatics, 20:979–981, 2004.
[JSH+ 03]
Glynn Dennis Jr, Brad T Sherman, Douglas A Hosack, Jun Yang, Wei
Gao, H Clifford Lane, and Richard A Lempicki. DAVID: database for
annotations, visualisation, and integrated discovery. Genome Biology,
4, 2003.
[KBBD04]
Purvesh Khatri, Pratik Bhavsar, Gagandeep Bawa, and Sorin Draghici.
Onto-tools: an ensemble of web accessible, ontology-based tools for the
functional design and interpretation of high-throughput gene expression
experiments. Nuclei Acids Research, 32:W449–W456, 2004.
[KD05]
Purvesh Khatri and Sorin Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics,
21(18):3587–3595, 2005.
[KFD+ 03]
Oliver D. King, Rebecca E. Foulger, Selina S. Dwight, James V. White,
and Frederick P. Roth. Predicting gene function from patterns of annotation. Genome Research, 13:896–904, 2003.
73
[KOTT03]
J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. GENIA corpus - a semantically annotated corpus for bio-textmining. Bioinformatics, 19:i180–
i182, 2003.
[KPL03]
Jacob Köhler, Stephan Philippi, and Matthias Lange. SEMEDA: ontology based semantic integration of biological databases. Bioinformatics,
19:2420–2427, 2003.
[KSC+ 05]
Eilbeck K., Lewis S.E., Mungall C.J., Yandell M., Stein L., Durbin R.,
and Ashburner M. The Sequence Ontology: A tool for the unification
of genome annotations. Genome Biology, (6):R44, 2005.
[KSDS03]
Salim Khan, Gang Situ, Keith Decker, and Carl J. Schmidt. GeneFigure:
Automated Gene Ontology annotation. Bioinformatics, 19:2484–2485,
2003.
[KSK02]
Satoshi Kamegai, Kenji Satou, and Akihiko Konagaya.
To-
ward ontology-based knowledge extraction from biomedical literature.
Genome Informatics, 13:576–577, 2002.
[KSN04]
Anand Kumar, Barry Smith, and Daniel D. Novotny. Biomedical Informatics and Granularity. Comparative and Functional Genomics, 5:501–
508, 2004.
[KSR+ 04]
Christie KR, Weng S, Balakrishnan R, Costanzo MC, Dolinski K,
Dwight SS, Engel SR, Feierbach B, Fisk DG, Hirschman JE, Hong EL,
Issel-Tarver L, Nash R, Sethuraman A, Starr B, Thusfeld CL, Andrada
R, Binkley G, Dong Q, Lane C, Schroeder M, Botstein D, and Cherry
JM. Saccharomyces Genome Database (GSD) provides tools to identify
and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms. Nucleic Acid Research, 32:D311–D314,
2004.
[Kwo03]
Oh Byung Kwon. I know what you need to buy: context-aware
multimedia-based recommendation system. Expert Systems with Applications, 25:387–400, 2003.
[Lew05]
Suzanna Lewis. Gene Ontology: looking backwards and forwards.
Genome Biology, 6:103, 2005.
74
[LHK04]
Sung Geun Lee, Jung Uk Hur, and Yang Seok Kim. A graph-theoretic
modeling on GO space for biological interpretation of gene clusters.
Bioinformatics, 20:381–388, 2004.
[LHMK03]
Astrid Lægreid, Torgeir R. Hvidsten, Herman Midelfart, and Jan Komorowski. Predicting Gene Onotology Biological Process from temporal gene expression patterns. Genome Research, 13:965–979, 2003.
[LHP03]
Patrick Lambrix, Manal Habbouche, and Marta Pérez. Evaluation
of Ontology Development tools for bioinformatics. Bioinformatics,
19(12):1564–1571, 2003.
[LM04]
Jane Lomax and Alexa T. McCray. Mapping the Gene Ontology into
the Unified Medical Language System. Comparative and Functional
Genomics, 5:354–361, 2004.
[LSBG03]
P.W. Lord, R.D. Stevens, A. Brass, and C.A. Goble. Investigating semantic similarity measures across the Gene Ontology: the relationship
between sequence and annotation. Bioinformatics, 19:1275–1283, 2003.
[LT04]
Shun-Chieh Lin and Shian-Shyong Tseng.
Constructing detection
knowledge for DDoS intrusion tolerance. Expert Systems with applications, 27:379–390, 2004.
[LZ04]
Yuefeng Li and Ning Zhong. Web mining model and its applications for
information gathering. Knowledge-Based Systems, 2004.
[MBH+ 05]
David Milward, Marcus Bjäreland, William Hayes, Michelle Maxwell,
Lisa Örbeg, Nick Tilford, James Thomas, Roger Hale, Sylvia Knight,
and Julie Barnes. Ontology-based interactive information extraction
from scientific abstracts. Comparative and Functional Genomics, 6:67–
71, 2005.
[MBR+ 04]
David Martin, Christine Brun, Elisabeth Remy, Pierre Mouren, Denis
Thieffry, and Bernard Jacq. GOToolBox: functional analysis of gene
datasets based on Gene Ontology. Bioinformatics, 5:R101, 2004.
[McG01]
Deborah L. McGuinness. The Semantic Web: Why, What and How,
chapter Ontologies come of age. MIT press, 2001.
75
[MSS05]
Boris Motik, Ulrike Sattler, and Rudi Studer. Query Answering for
OWL-DL with Rules. Journal of web semantics, (3):41–60, 2005.
[MTES05]
J. P. Massar, Michael Travers, Jeff Elhar, and Jeff Shrager. BioLingua:
a programmable knowledge environment for biologists. Bioinformatics,
21(2):199–207, 2005.
[Mun05]
Chris J. Mungall. OBOL: Integrating Language and Meaning in BioOntologies.
2005.
[Mus05]
Comparative and Functional Genomics, (5):509–520,
Mark Musen. From Cottage Industry to the Industrial Age: New Infrastructure for Ontology Authoring and Dissemination. In Protégé international conference, 2005.
[NMW04]
Eric K. Neumann, Eric Miller, and John Wilbanks. What the Semantic
Web could do for Life Sciences. Biosilico, 2(6):228–236, 2004.
[OCAM+ 04] P.V. Ogren, K.B. Cohen, G.K. Acquaah-Mensah, J.Eberlein, and
L. Hunter. The Compositional Structure of Gene Ontology terms. In
Pac Symp Biocomput., pages 214–25, 2004.
[Ode94]
James J. Odell. Six Different Kinds of Composition. Journal Of ObjectOriented Programming, 5(8), 1994.
[OGA+ 05]
Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir,
Justin Ferris, Kevin Glover, Carole Goble, Antoon Goderis, Duncan
Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin
Senger, Robert Stevens, Anil Wipat, and Chris Wroe. Taverna: Lessons
in creating a workflow environment for the life sciences. Concurrency
and Computation: Practice and Experience Grid Workflow, 2005. Accepted for Publication.
[RCSA02]
Soumya Raychaudhuri, Jeffrey T. Chang, Patrick D. Sutphin, and
Russ B. Altman. Associating genes with Gene Ontology codes using a
maximum entropy analysis of biomedical literature. Genome Research,
12:203–214, 2002.
[Rec02]
Alan Rector. Analysis of propagation along transitive roles: Formalisation of the GALEN experience with medical ontologies. In DL, 2002.
76
[Rec03]
Alan L. Rector. Modularisation of Domain Ontologies Implemented in
Description Logics and related formalisms including OWL. In K-CAP,
pages 121–128, 2003.
[RKM+ 05]
Cornellius Rosse, Anand Kumar, Jose LV Mejino, Daniel L Cooks, Landom T Detwiler, and Barry Smith. A strategy for improving and integrating biomedical ontologies. In Annual symposium of American Medical Informatics Association (AMIA), 2005.
[RR00]
Jeremy Rogers and Alan Rector. GALEN’s Model of Parts and Wholes:
Experience and Comparisons. In Proc. AMIA symp, pages 714–718,
2000.
[RR04]
Alan L Rector and Jeremy Rogers. Patterns, Properties and Minimizing
Commitment: Reconstruction of the GALEN Upper Ontology in OWL.
In EKAW, 2004.
[RWBB04]
Peter N. Robinson, Andreas Wollstein, Ulrike Böhme, and Brad Beattie.
Ontologizing gene-expression microarray data: characterizing clusters
with Gene Ontology. Bioinformatics, 20:979–981, 2004.
[RWRR01]
Alan L. Rector, Chris Wroe, Jeremy Rogers, and Angus Roberts. Untangling Taxonomies and Relationships: personal and Practical Problems
in Loosely Coupled Development of Large Ontologies. In K-CAP, pages
139–146, 2001.
[SAW+ 05]
Robert Stevens, Mikel Egaña Aranguren, Katty Wolstencroft, Ulrike
Sattler, Nick Drummond, and Mathew Horridge. Managing OWL’s
Limitations in Modelling Biomedical Knowledge. Submitted to International Journal of Human Computer Studies – special issue on the limits
of ontologies, 2005.
[SCK+ 05]
Barry Smith, Werner Ceusters, Bert Klagges, Jacob Khöler, Anand Kumar, Jane Lomax, Chris Mungall, Fabian Neuhaus, Alan L Rector, and
Cornelius Rosse. Relations in biomedical ontologies. Genome Biology,
(6):R46, 2005.
[SDSH05]
Stefan Schulz, Philipp Daumke, Barry Smith, and Udo Hahn. How to
distinguish parthood from location in bioontologies. In Annual symposium of American Medical Informatics Association (AMIA), 2005.
77
[SGB00]
R. Stevens, C.A. Goble, and S. Bechhofer. Ontology-based Knowledge Representation for Bioinformatics. Briefings in Bioinformatics,
1(4):398–416, 2000.
[SGP+ 03]
Robert Stevens, Carole Goble, Norman W. Paton, Sean Bechhofer, Gary
Ng, Patricia Baker, and Andy Brass. Complex Query Formulation Over
Diverse Information Sources in TAMBIS. In Zoe Lacroix and Terence
Critchlow, editors, Bioinformatics: Managing Scientific Data. Morgan
Kaufmann, May 2003.
[SH04]
Stefan Schultz and Udo Hahn. Towards a Computational Paradigm for
Biomedical Structure. In Proc. KR-MED, pages 63–71, 2004.
[SH05]
Stefan Shultz and Udo Hahn. Part-whole representation and reasoning in formal biomedical ontologies. Artificial Intelligence in Medicine,
34:179–200, 2005.
[Shr03]
Jeff Shrager. The fiction of function. Bioinformatics, 19(15):1934–
1936, 2003.
[SK02]
Steffen Schulze-Kremer. Ontologies for molecular biology and bioinformatics. In Silico Biology, 2(17), 2002.
[SK04a]
Barry Smith and Anand Kumar. Controlled vocabularies in bioinformatics: a case study in the gene ontology. Biosilico, 2(6):246–252, 2004.
[SK04b]
Heiner Stuckenschmidt and Michel Klein. Ontologies Refinement - Towards Structure-Based Partitioning of Large Ontologies. Wonder Web
deliverable 22, 2004.
[SK05]
Larisa N. Soldatova and Ross D. King. Are the Current Ontologies in
Biology Good Ontologies? Nature Biotechnology, 23(9):1095–1098,
2005.
[SKK04]
Barry Smith, Jacob Köhler, and Anand Kumar. On the application of
Formal Principles to Life Science Data: a Case Study in the Gene Ontology. In DILS, pages 74–94, 2004.
[SR04]
Barry Smith and Cornelius Rosse. The Role of Foundational Relations
in the Alignment fo Biomedical Ontologies. In MEDINFO. IOS press,
2004.
78
[SR05]
Julian Seidenberg and Alan Rector. Transitive propagation in OWL.
Work not published, 2005.
[SRG03]
Robert D. Stevens, Alan J. Robinson, and Carole A. Goble. MyGrid:
personalised bioinformatics on the information grid. Bioinformatics,
19:i302–i304, 2003.
[SRH98]
Stefan Schulz, Martin Romacker, and Udo Hahn. Part-Whole Reasoning in Medical Ontologies Revisited - Introducing SEP triplets into
Classification-based Description Logics. In Proceedings of the 1998
AMIA Annual Fall Symposium. A Paradigm Shift in Health Care Information Systems: Clinical Infrastructures for the 21st Century, pages
830–834. Hanley and Belfus, 1998.
[SWLG04]
Robert Stevens, Chris Wroe, Phillip Lord, and Carole Goble. Handbook on ontologies (International Handbooks on Information Systems),
chapter 10. Springer, 2004.
[SWSK03]
Barry Smith, Jennifer Williams, and Steffen Schulze-Kremer. The Ontology of the Gene Ontology. In Annual symposium of American Medical Informatics Association (AMIA), 2003.
[Tho03]
Jeffrey Thomas. Finding an Oasis in the Desert of Bioinformatics.
Biosilico, 1(2):56–58, 2003.
[VEF+ 04]
Stefano Volinia, Rita Evangelisti, Francesca Francioso, Diego Arcelli,
Massimo Carella, and Paolo Gasparini. GOAL: automated Gene Ontology analysis of expression profiles. Nucleic Acids Research, 32:W492–
W499, 2004.
[VR03]
Haarslev V and Möller R. RACER: a core inference engine for the
semantic web. In ISWC, pages 27–36, 2003.
[WA03]
Jennifer Williams and William Andersen. Bringing Ontology to the
Gene Ontology. Comparative and Functional Genomics, 4:90–93, 2003.
[WGA05]
Xiaoshu Wang, Robert Gorlitsky, and Jonas S Almeida. From XML to
RDF: how semantic web technologies will change the design of ’omic’
standards. Nature Biotechnology, 23(9):1099–1103, 2005.
79
[WMS+ 05]
K. Wolstencroft, R. McEntire, R. Stevens, L. Tabernero, and A. Brass.
Constructing Ontology-Driven Protein Family Databases. Bioinformatics, 21(8):1685–92, 2005.
[WSGA03]
C.J. Wroe, R.D. Stevens, C.A. Goble, and M. Ashburner. A Methodology to Migrate the Gene Ontology to a Description Logic Environment
Using DAML+OIL. In 8th Pacific Symposium on biocomputing (PSB),
pages 624–636, 2003.
[WSM+ 05]
Hongwei Wu, Zhengchang Su, Fenglou Mao, Victor Olman, and Ying
Xu. Prediction of functional modules based on comparative genome
analysis and Gene Ontology application. Nucleic Acids Research,
33(9):2822–2837, 2005.
[XWL+ 02]
Hanqing Xie, Alon Wasserman, Zurit Levine, Amit Novik, Vladimir
Grebinskiy, Avi Shoshan, and Liat Mintz. Large-scale protein annotation through Gene Ontology. Genome Research, 12:785–794, 2002.
[YKNA03]
Iwei Yeh, Peter D. Karp, Natasha F. Noy, and Russ B. Altman. Knowledge acquisition, consistency checking and concurrency control for
Gene Ontology (GO). Bioinformatics, 19(2):241–248, 2003.
[YWCS05]
A. Young, N. Whitehouse, J. Cho, and C. Shaw. OntologyTraverser: an
R package for GO analysis. Bioinformatics, 21(2):275–276, 2005.
[Zeh03]
Günther Zehetner. OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms. Nucleic
Acid Research, 31:3799–3803, 2003.
[ZFW+ 03]
Barry R Zeeberg, Weimin Feng, Geoffrey Wang, May D Wang, Anthony T Fojo, Margot Sunshine, Sudarshan Narasimhan, David W Kane,
William C Reinhold, Samir Lababidi, Kimberly J Bussey, Joseph Riss,
J Carl Barret, and John N Weinstein. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biology, 4,
2003.
[ZM03]
Zuo Zhihong and Zhou Mingtian. Web Ontology Language OWL and
its Description Logic Foundation. In Proceedings of the Fourth International Conference on Parallel and Distributed Computing, Applications
and Technologies, pages 157–160, 2003.
80
[ZSKS04]
Bing Zhang, Denise Schmoyer, Stefan Kirov, and Jay Snoddy. GOTree
Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC bioinformatics,
5(16), 2004.
81