Download From thesauri to rich ontology: The AGROVOC case

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
From thesauri to rich
ontologies:
The AGROVOC case
Boris Lauser
Food and Agriculture Organization (FAO)
Rome, Italy
[email protected], www.fao.org
DELOS Workshop
Lund, Sweden June 23, 2004
1
The problem
• AI and Semantic Web applications need full-fledged
ontologies that support reasoning
• Constructing such ontologies is expensive
• While existing KOS do not provide the full set of precise
concept relationships needed for reasoning,
existing KOS, both large and small, represent much
intellectual capital
KOS = Knowledge Organization System
• How can this intellectual capital be put to use
in constructing full-fledged ontologies
• Specifically: From AGROVOC to a full-fledged
Food and Agriculture Ontology
2
Some applications of a
Food and Agriculture Ontology
• Advice on crops and crop management
(fertilization, irrigation)
• Advice on pest management
• Tracking contaminants through the food chain
• Advice on safe food processing
• Computing nutrition labels
• Advice on healthy eating
• Improved searching
3
AGROVOC relationships compared with
more differentiated relationships
of a Food and Agriculture Ontology
4
AGROVOC
Food and Agriculture Ontology
Undifferentiated hierarchical relationships
Differentiated relationships
milk
NT cow milk
NT milk fat
milk
<includesSpecific> cow milk
<containsSubstance> milk fat
cows
NT cow milk
cows
<hasComponent> cow milk*
Cheddar cheese
BT cow milk
Cheddar cheese
<madeFrom> cow milk
Rule 1
Part X <mayContainSubstance> Substance Y
IF Animal W <hasComponent> Part X
AND Animal W <ingests> Substance Y
Rule 2
Food Z <containsSubstance> Substance Y
IF Food Z <madeFrom> Part X
AND Part X <containsSubstance> Substance Y
5
From AGROVOC to FA Ontology
1) Define the FA Ontology structure
2) Fill in values from AGROVOC to the extent possible
3) Edit manually with computer assistance
using the rules-as-you go approach and
an ontology editor:
•
make existing information more precise
•
add new information
6
Define ontology structure
Overall model
7
Relationships
between
Relationships
Relationships
between
concepts
Relationship
Concept
designated by
Relationships
between
terms
annotation
relationship
Lexicalization/
Term
manifested as
Relationships
between
strings
String
Note
Other information:
language/culture
subvocabulary/scope
audience
type, etc.
8
Define ontology structure
Relationship types
9
Isa
Relationship
Inverse relationship
X <includesSpecific>
Y <isa> X
X <inheritsTo> Y
Y <inheritsFrom> X
10
Holonymy / meronymy
(the generic whole-part relationship)
Relationship
X
X
X
X
X
X
X
X
<containsSubstance> Y
<hasIngredient> Y
<madeFrom> Y
<yieldsPortion> Y
<spatiallyIncludes> Y
<hasComponent> Y
<includesSubprocess> Y
<hasMember> Y
Inverse relationship
Y
Y
Y
Y
Y
Y
Y
Y
<substanceContainedIn> X
<ingredientOf> X
<usedToMake> X
<portionOf> X
<spatiallyIncludedIn> X
<componentOf> X
<subprocessOf> X
<memberOf> X
Y
11
Further relationship examples
Relationship
X <causes> Y
X <instrumentFor> Y
X <processFor> Y
X <beneficialFor> Y
X <treatmentFor> Y
X <harmfulFor> Y
X <hasPest> Y
X <growsIn> Y
X <hasProperty> Y
X <hasSymptom> Y
X <similarTo> Y
X <oppositeTo> Y
X <hasPhase> Y
X <ingests> Y
X <madeFrom> Y
Inverse relationship
Y <causedBy> X
Y <performedByInstrument> X
Y <usesProcess> X
Y <benefitsFrom> X
Y <treatedWith> X
Y <harmedBy> X
Y <afflicts> X
Y <growthEnvironmentFor> X
Y <propertyOf> X
Y <indicates> X
Y <similarTo> X
Y <oppositeTo> X
Y <phaseOf> X
Y <ingestedBy> X
Y <usedToMake> X
12
Fill in values from AGROVOC
• Fill in values from AGROVOC to the extent possible
• Arrange in structured sequence (to the extent
possible based on the information in AGROVOC) to
facilitate editing
(The editor can deal with similar problems at the
same time.)
13
Undifferentiated relationships from
AGROVOC
milk
milk
milk
milk
milk
milk
cows
goats
ewes
goat milk
ewe milk
acid soils
acrisols
alkaline soils
aluvial soils
chemical soil types
Cichorium
Cichorium endivia
Cichorium intybus
Cichorium intybus
Cichorium intybus
blood
blood
NT cow milk
NT goat milk
NT buffalo milk
NT milk fat
RT milk protein
RT lactose
RT cow milk
RT goat milk
RT ewe milk
RT goat cheese
RT ewe cheese
BT chemical soil types
BT genetic soil types
BT chemical soil types
BT lithological soil types
BT soil types
BT Asteraceae
BT Cichorium
BT Cichorium
RT coffee substitutes
RT root vegetables
NT blood protein
NT blood lipids
Edited relationships
14
Edit manually
with computer assistance
• Use the rules-as-you-go approach and
good ontology editing software
that handles large ontologies efficiently
• make existing information more precise
• add new information
Assumption:
Entity types of concepts are known from AGROVOC
or other sources (Langual, UMLS, WordNet); for example
milk fat is a Substance
Asteraceae is a taxon
The editor may need to determine the entity type
15
The rules-as-you-go approach
Exploit patterns to automate the conversion process
Example
1. An editor has determined that
milk NT cow milk should become milk <includesSpecific> cow milk
2. She recognizes that this is an example of the general pattern
milk NT * milk  milk <includesSpecific> * milk
(where * is the wildcard character)
3. Given this pattern, the system can derive automatically
milk NT goat milk should become milk <includesSpecific> goat milk
Result:
16
Undifferentiated relationships from
AGROVOC
Edited relationships
milk
milk
milk
milk
milk
milk
cow
goats
ewes
goat milk
ewe milk
acid soils
acrisols
alkaline soils
aluvial soils
chemical soil types
Cichorium
Cichorium endivia
Cichorium intybus
Cichorium intybus
Cichorium intybus
blood
blood
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milk
NT cow milk
NT goat milk
NT buffalo milk
NT milk fat
RT milk protein
RT lactose
RT cow milk
RT goat milk
RT ewe milk
RT goat cheese
RT ewe cheese
BT chemical soil types
BT genetic soil types
BT chemical soil types
BT lithological soil types
BT soil types
BT Asteraceae
BT Cichorium
BT Cichorium
RT coffee substitutes
RT root vegetables
NT blood protein
NT blood lipids
17
The rules as you go approach
Exploit patterns to automate the conversion process
1. Editor:
milk NT milk fat  milk <containsSubstance> milk fat
2. Pattern:
Substance NT/RT Substance 
Substance <containsSubstance> Substance
3. Therefore
milk RT milk protein  milk <containsSubstance> milk protein
Result:
18
Undifferentiated relationships from
AGROVOC
Edited relationships
milk
milk
milk
milk
milk
milk
cows
goats
ewes
goat milk
ewe milk
acid soils
acrisols
alkaline soils
aluvial soils
chemical soil types
Cichorium
Cichorium endivia
Cichorium intybus
Cichorium intybus
Cichorium intybus
blood
blood
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milk
milk <containsSubstance> milk fat
milk <containsSubstance> milk protein
milk <containsSubstance> lactose
NT cow milk
NT goat milk
NT buffalo milk
NT milk fat
RT milk protein
RT lactose
RT cow milk
RT goat milk
RT ewe milk
RT goat cheese
RT ewe cheese
BT chemical soil types
BT genetic soil types
BT chemical soil types
BT lithological soil types
BT soil types
BT Asteraceae
BT Cichorium
BT Cichorium
RT coffee substitutes
RT root vegetables
NT blood protein
NT blood lipids
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
blood <containsSubstance> blood protein
blood <containsSubstance> blood lipids
19
The rules as you go approach
Exploit patterns to automate the conversion process
1. Editor:
cows RT cow milk  cows <hasComponent> cow milk
2. Pattern
Animal RT BodyPart  Animal <hasComponent> BodyPart
3. Therefore:
goats NT goat milk  goat <hasComponent> goat milk
Result:
20
Undifferentiated relationships from
AGROVOC
Edited relationships
milk
milk
milk
milk
milk
milk
cow
goats
ewes
goat milk
ewe milk
acid soils
acrisols
alkaline soils
aluvial soils
chemical soil types
Cichorium
Cichorium endivia
Cichorium intybus
Cichorium intybus
Cichorium intybus
blood
blood
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milk
milk <containsSubstance> milk fat
milk <containsSubstance> milk protein
milk <containsSubstance> lactose
cows <hasComponent> cow milk
goats <hasComponent> goat milk
ewes <hasComponent> ewe milk
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
NT cow milk
NT goat milk
NT buffalo milk
NT milk fat
RT milk protein
RT lactose
RT cow milk
RT goat milk
RT ewe milk
RT goat cheese
RT ewe cheese
BT chemical soil types
BT genetic soil types
BT chemical soil types
BT lithological soil types
BT soil types
BT Asteraceae
BT Cichorium
BT Cichorium
RT coffee substitutes
RT root vegetables
NT blood protein
NT blood lipids
blood <containsSubstance> blood protein
blood <containsSubstance> blood lipids
21
The rules as you go approach
Exploit patterns to automate the conversion process
1. Editor:
acid soils BT chemical soil types  acid soils <isa> chemical soil types
2. Pattern:
X BT * type*  X <isa> * type*
3. Therefore:
acrisols BT genetic soil types  acrisols <isa> genetic soil types
Result:
22
Undifferentiated relationships from
AGROVOC
Edited relationships
milk
milk
milk
milk
milk
milk
cow
goats
ewes
goat milk
ewe milk
acid soils
acrisols
alkaline soils
aluvial soils
chemical soil types
Cichorium
Cichorium endivia
Cichorium intybus
Cichorium intybus
Cichorium intybus
blood
blood
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milk
milk <containsSubstance> milk fat
milk <containsSubstance> milk protein
milk <containsSubstance> lactose
cows <hasComponent> cow milk
goats <hasComponent> goat milk
ewes <hasComponent> ewe milk
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
acid soils <isa> chemical soil types
acrisols <isa> genetic soil types
alkaline soils <isa> chemical soil types
aluvial soils <isa> lithological soil types
chemical soil type <isa> soil types
NT cow milk
NT goat milk
NT buffalo milk
NT milk fat
RT milk protein
RT lactose
RT cow milk
RT goat milk
RT ewe milk
RT goat cheese
RT ewe cheese
BT chemical soil types
BT genetic soil types
BT chemical soil types
BT lithological soil types
BT soil types
BT Asteraceae
BT Cichorium
BT Cichorium
RT coffee substitutes
RT root vegetables
NT blood protein
NT blood lipids
blood <containsSubstance> blood protein
blood <containsSubstance> blood lipids
23
The rules as you go approach
Exploit patterns to automate the conversion process
1. Editor:
Cichorium BT Asteraceae
 Cichorium <isa> Asteraceae
2. Pattern:
Taxon BT Taxon  Taxon <isa> Taxon
3. Therefore:
Cichorium endivia BT Cichorium  Cichorium endivia <isa> Cichorium
Result:
24
Undifferentiated relationships from
AGROVOC
Edited relationships
milk
milk
milk
milk
milk
milk
cow
goats
ewes
goat milk
ewe milk
acid soils
acrisols
alkaline soils
aluvial soils
chemical soil types
Cichorium
Cichorium endivia
Cichorium intybus
Cichorium intybus
Cichorium intybus
blood
blood
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milk
milk <containsSubstance> milk fat
milk <containsSubstance> milk protein
milk <containsSubstance> lactose
cows <hasComponent> cow milk
goats <hasComponent> goat milk
ewes <hasComponent> ewe milk
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
acid soils <isa> chemical soil types
acrisols <isa> genetic soil types
alkaline soils <isa> chemical soil types
aluvial soils <isa> lithological soil types
chemical soil type <isa> soil types
Cichorium <isa> Asteraceae
Cichorium endivia <isa> Cichorium
Cichorium intybus <isa> Cichorium
NT cow milk
NT goat milk
NT buffalo milk
NT milk fat
RT milk protein
RT lactose
RT cow milk
RT goat milk
RT ewe milk
RT goat cheese
RT ewe cheese
BT chemical soil types
BT genetic soil types
BT chemical soil types
BT lithological soil types
BT soil types
BT Asteraceae
BT Cichorium
BT Cichorium
RT coffee substitutes
RT root vegetables
NT blood protein
NT blood lipids
blood <containsSubstance> blood protein
blood <containsSubstance> blood lipids
25
The rules as you go approach
Exploit patterns to automate the conversion process
1. Editor:
Cichorium intybus RT coffee substitutes
 Cichorium intybus <usedToMake> coffee substitutes
2. Pattern:
Taxon RT FoodProduct  Taxon <usedToMake> FoodProduct
3. Therefore:
Cichorium intybus RT root vegetables
 Cichorium intybus <usedToMake> root vegetables
Result:
26
Undifferentiated relationships from
AGROVOC
Edited relationships
milk
milk
milk
milk
milk
milk
cow
goats
ewes
goat milk
ewe milk
acid soils
acrisols
alkaline soils
aluvial soils
chemical soil types
Cichorium
Cichorium endivia
Cichorium intybus
Cichorium intybus
Cichorium intybus
blood
blood
milk <includesSpecific> cow milk
milk <includesSpecific> goat milk
milk <includesSpecific> buffalo milk
milk <containsSubstance> milk fat
milk <containsSubstance> milk protein
milk <containsSubstance> lactose
cows <hasComponent> cow milk
goats <hasComponent> goat milk
ewes <hasComponent> ewe milk
goat milk <containsSubstance> goat cheese
ewe milk <containsSubstance> ewe cheese
acid soils <isa> chemical soil types
acrisols <isa> genetic soil types
alkaline soils <isa> chemical soil types
aluvial soils <isa> lithological soil types
chemical soil type <isa> soil types
Cichorium <isa> Asteraceae
Cichorium endivia <isa> Cichorium
Cichorium intybus <isa> Cichorium
Cichorium intybus <usedToMake> coffee substitutes
Cichorium intybus <usedToMake> root vegetables
blood <containsSubstance> blood protein
27
blood <containsSubstance> blood lipids
NT cow milk
NT goat milk
NT buffalo milk
NT milk fat
RT milk protein
RT lactose
RT cow milk
RT goat milk
RT ewe milk
RT goat cheese
RT ewe cheese
BT chemical soil types
BT genetic soil types
BT chemical soil types
BT lithological soil types
BT soil types
BT Asteraceae
BT Cichorium
BT Cichorium
RT coffee substitutes
RT root vegetables
NT blood protein
NT blood lipids
The rules as you go approach
Discussion
Main idea: Formulate constraints to assist the editor
• Ontology may have many relationship types,
perhaps > 100
• Constraints limit the relationship types that are
possible in a specific case; show the editor only these
• If the constraints limit possible relationship types to 1,
conversion is automatic
• Constraints may depend on Thesaurus to be
converted
28
Constraints
Thesaurus
Relationships
Possible ontology relationships
NT / BT
<hasMember>
<includesSpecific>
<hasComponent>
<spatiallyIncludes>
etc.
|
|
|
|
<memberOf>
<isa>
<componentOf>
<spatiallyIncludedIn>
RT
<similarTo>
<growsIn>
<treatmentFor>
<hasMember>
<hasComponent>
<madeFrom>
etc.
|
|
|
|
|
|
<similarTo>
<EnvironmentForGrowing>
<treatedWith>
<memberOf>
<componentOf>
<usedToMake>
29
Constraints
Thesaurus Relationships
+ entity types or values
Possible ontology relationships
milk NT * milk
milk <includesSpecific> * milk
Substance NT Substance
Substance <containsSubstance> Substance
X BT * type*
X <isa> * type*
Taxon BT Taxon
Taxon <isa> Taxon
GeogrEntity BT GeogrEntity
GeogrEntity <spatiallyIncludedIn> GeogrEntity
BodyPart BT BodyPart
BodyPart <isComponentOf> BodyPart
ChemSubstance BT
ChemSubstance
ChemSubstance <isa> ChemSubstance
30
Constraints
Thesaurus Relationships
+ entity types or values
Possible ontology relationships
Substance RT Substance
Substance <containsSubstance> Substance
Substance <containedInSubstance> Substance
Substance <usedToMake> Substance
Substance <madeFrom> Substance
LivingOrganism RT BodyPart
LivingOrganism <hasComponent> BodyPart
Taxon RT FoodProduct
Taxon <usedToMake> FoodProduct
GeogrEntity RT GeogrGrouping
GeogrEntity <isMemberOf> GeogrGrouping
Process RT Object
Process <performedByInstrument> Object
Process <affects> Object
ChemSubstance RT Function
ChemSubstance <usedFor> Function
31
Checking by editor
• Relationship instances created by editor
by selecting from a constraint-generated menu
are final
• Relationship instances created automatically
must be presented to the editor
• If the editor determines that the relationship instances
are almost always correct, she checks a box
accept without checking
32
Overall conversion process
• One master editor must go through the file
from start to finish,
processing the relationship instances
and creating patterns,
creating new relationship types as needed
• Assistant editors can apply the patterns.
• In the first pass, the master editor should deal with
the easy cases.
• Deal with the remaining cases later.
Groups of similar relationship instances can be seen
more easily in a smaller set
33
Adding new relationship types
and new relationship instances
• AGROVOC does not contain all relationship types or
relationship instances for AI applications
• Need to add data. For example
Organism X <hasPest> Organism Y
ChemSubstance X <actsAgainst> Organism Y
Organism X <actsAgainst> Organism Y
Plant X <growsIn> Environment Y
FoodProduct X <suitableFor> Diet Y
34
Conclusion
The rules-as-you-go approach is a realistic method for
developing a rich ontology from an existing thesaurus
Full paper:
Reengineering Thesauri for New Applications: the
AGROVOC Example
Journal of Digital Information, Volume 4 Issue 4
http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/
35
References
• For questions and discussion contact
Boris Lauser
[email protected]
Dagobert Soergel
[email protected]
• AOS: Agricultural Ontology Service Project
http://www.fao.org/agris/aos
• AGMES: http://www.fao.org/agris/agmes
36