Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
From thesauri to rich ontologies: The AGROVOC case Boris Lauser Food and Agriculture Organization (FAO) Rome, Italy [email protected], www.fao.org DELOS Workshop Lund, Sweden June 23, 2004 1 The problem • AI and Semantic Web applications need full-fledged ontologies that support reasoning • Constructing such ontologies is expensive • While existing KOS do not provide the full set of precise concept relationships needed for reasoning, existing KOS, both large and small, represent much intellectual capital KOS = Knowledge Organization System • How can this intellectual capital be put to use in constructing full-fledged ontologies • Specifically: From AGROVOC to a full-fledged Food and Agriculture Ontology 2 Some applications of a Food and Agriculture Ontology • Advice on crops and crop management (fertilization, irrigation) • Advice on pest management • Tracking contaminants through the food chain • Advice on safe food processing • Computing nutrition labels • Advice on healthy eating • Improved searching 3 AGROVOC relationships compared with more differentiated relationships of a Food and Agriculture Ontology 4 AGROVOC Food and Agriculture Ontology Undifferentiated hierarchical relationships Differentiated relationships milk NT cow milk NT milk fat milk <includesSpecific> cow milk <containsSubstance> milk fat cows NT cow milk cows <hasComponent> cow milk* Cheddar cheese BT cow milk Cheddar cheese <madeFrom> cow milk Rule 1 Part X <mayContainSubstance> Substance Y IF Animal W <hasComponent> Part X AND Animal W <ingests> Substance Y Rule 2 Food Z <containsSubstance> Substance Y IF Food Z <madeFrom> Part X AND Part X <containsSubstance> Substance Y 5 From AGROVOC to FA Ontology 1) Define the FA Ontology structure 2) Fill in values from AGROVOC to the extent possible 3) Edit manually with computer assistance using the rules-as-you go approach and an ontology editor: • make existing information more precise • add new information 6 Define ontology structure Overall model 7 Relationships between Relationships Relationships between concepts Relationship Concept designated by Relationships between terms annotation relationship Lexicalization/ Term manifested as Relationships between strings String Note Other information: language/culture subvocabulary/scope audience type, etc. 8 Define ontology structure Relationship types 9 Isa Relationship Inverse relationship X <includesSpecific> Y <isa> X X <inheritsTo> Y Y <inheritsFrom> X 10 Holonymy / meronymy (the generic whole-part relationship) Relationship X X X X X X X X <containsSubstance> Y <hasIngredient> Y <madeFrom> Y <yieldsPortion> Y <spatiallyIncludes> Y <hasComponent> Y <includesSubprocess> Y <hasMember> Y Inverse relationship Y Y Y Y Y Y Y Y <substanceContainedIn> X <ingredientOf> X <usedToMake> X <portionOf> X <spatiallyIncludedIn> X <componentOf> X <subprocessOf> X <memberOf> X Y 11 Further relationship examples Relationship X <causes> Y X <instrumentFor> Y X <processFor> Y X <beneficialFor> Y X <treatmentFor> Y X <harmfulFor> Y X <hasPest> Y X <growsIn> Y X <hasProperty> Y X <hasSymptom> Y X <similarTo> Y X <oppositeTo> Y X <hasPhase> Y X <ingests> Y X <madeFrom> Y Inverse relationship Y <causedBy> X Y <performedByInstrument> X Y <usesProcess> X Y <benefitsFrom> X Y <treatedWith> X Y <harmedBy> X Y <afflicts> X Y <growthEnvironmentFor> X Y <propertyOf> X Y <indicates> X Y <similarTo> X Y <oppositeTo> X Y <phaseOf> X Y <ingestedBy> X Y <usedToMake> X 12 Fill in values from AGROVOC • Fill in values from AGROVOC to the extent possible • Arrange in structured sequence (to the extent possible based on the information in AGROVOC) to facilitate editing (The editor can deal with similar problems at the same time.) 13 Undifferentiated relationships from AGROVOC milk milk milk milk milk milk cows goats ewes goat milk ewe milk acid soils acrisols alkaline soils aluvial soils chemical soil types Cichorium Cichorium endivia Cichorium intybus Cichorium intybus Cichorium intybus blood blood NT cow milk NT goat milk NT buffalo milk NT milk fat RT milk protein RT lactose RT cow milk RT goat milk RT ewe milk RT goat cheese RT ewe cheese BT chemical soil types BT genetic soil types BT chemical soil types BT lithological soil types BT soil types BT Asteraceae BT Cichorium BT Cichorium RT coffee substitutes RT root vegetables NT blood protein NT blood lipids Edited relationships 14 Edit manually with computer assistance • Use the rules-as-you-go approach and good ontology editing software that handles large ontologies efficiently • make existing information more precise • add new information Assumption: Entity types of concepts are known from AGROVOC or other sources (Langual, UMLS, WordNet); for example milk fat is a Substance Asteraceae is a taxon The editor may need to determine the entity type 15 The rules-as-you-go approach Exploit patterns to automate the conversion process Example 1. An editor has determined that milk NT cow milk should become milk <includesSpecific> cow milk 2. She recognizes that this is an example of the general pattern milk NT * milk milk <includesSpecific> * milk (where * is the wildcard character) 3. Given this pattern, the system can derive automatically milk NT goat milk should become milk <includesSpecific> goat milk Result: 16 Undifferentiated relationships from AGROVOC Edited relationships milk milk milk milk milk milk cow goats ewes goat milk ewe milk acid soils acrisols alkaline soils aluvial soils chemical soil types Cichorium Cichorium endivia Cichorium intybus Cichorium intybus Cichorium intybus blood blood milk <includesSpecific> cow milk milk <includesSpecific> goat milk milk <includesSpecific> buffalo milk NT cow milk NT goat milk NT buffalo milk NT milk fat RT milk protein RT lactose RT cow milk RT goat milk RT ewe milk RT goat cheese RT ewe cheese BT chemical soil types BT genetic soil types BT chemical soil types BT lithological soil types BT soil types BT Asteraceae BT Cichorium BT Cichorium RT coffee substitutes RT root vegetables NT blood protein NT blood lipids 17 The rules as you go approach Exploit patterns to automate the conversion process 1. Editor: milk NT milk fat milk <containsSubstance> milk fat 2. Pattern: Substance NT/RT Substance Substance <containsSubstance> Substance 3. Therefore milk RT milk protein milk <containsSubstance> milk protein Result: 18 Undifferentiated relationships from AGROVOC Edited relationships milk milk milk milk milk milk cows goats ewes goat milk ewe milk acid soils acrisols alkaline soils aluvial soils chemical soil types Cichorium Cichorium endivia Cichorium intybus Cichorium intybus Cichorium intybus blood blood milk <includesSpecific> cow milk milk <includesSpecific> goat milk milk <includesSpecific> buffalo milk milk <containsSubstance> milk fat milk <containsSubstance> milk protein milk <containsSubstance> lactose NT cow milk NT goat milk NT buffalo milk NT milk fat RT milk protein RT lactose RT cow milk RT goat milk RT ewe milk RT goat cheese RT ewe cheese BT chemical soil types BT genetic soil types BT chemical soil types BT lithological soil types BT soil types BT Asteraceae BT Cichorium BT Cichorium RT coffee substitutes RT root vegetables NT blood protein NT blood lipids goat milk <containsSubstance> goat cheese ewe milk <containsSubstance> ewe cheese blood <containsSubstance> blood protein blood <containsSubstance> blood lipids 19 The rules as you go approach Exploit patterns to automate the conversion process 1. Editor: cows RT cow milk cows <hasComponent> cow milk 2. Pattern Animal RT BodyPart Animal <hasComponent> BodyPart 3. Therefore: goats NT goat milk goat <hasComponent> goat milk Result: 20 Undifferentiated relationships from AGROVOC Edited relationships milk milk milk milk milk milk cow goats ewes goat milk ewe milk acid soils acrisols alkaline soils aluvial soils chemical soil types Cichorium Cichorium endivia Cichorium intybus Cichorium intybus Cichorium intybus blood blood milk <includesSpecific> cow milk milk <includesSpecific> goat milk milk <includesSpecific> buffalo milk milk <containsSubstance> milk fat milk <containsSubstance> milk protein milk <containsSubstance> lactose cows <hasComponent> cow milk goats <hasComponent> goat milk ewes <hasComponent> ewe milk goat milk <containsSubstance> goat cheese ewe milk <containsSubstance> ewe cheese NT cow milk NT goat milk NT buffalo milk NT milk fat RT milk protein RT lactose RT cow milk RT goat milk RT ewe milk RT goat cheese RT ewe cheese BT chemical soil types BT genetic soil types BT chemical soil types BT lithological soil types BT soil types BT Asteraceae BT Cichorium BT Cichorium RT coffee substitutes RT root vegetables NT blood protein NT blood lipids blood <containsSubstance> blood protein blood <containsSubstance> blood lipids 21 The rules as you go approach Exploit patterns to automate the conversion process 1. Editor: acid soils BT chemical soil types acid soils <isa> chemical soil types 2. Pattern: X BT * type* X <isa> * type* 3. Therefore: acrisols BT genetic soil types acrisols <isa> genetic soil types Result: 22 Undifferentiated relationships from AGROVOC Edited relationships milk milk milk milk milk milk cow goats ewes goat milk ewe milk acid soils acrisols alkaline soils aluvial soils chemical soil types Cichorium Cichorium endivia Cichorium intybus Cichorium intybus Cichorium intybus blood blood milk <includesSpecific> cow milk milk <includesSpecific> goat milk milk <includesSpecific> buffalo milk milk <containsSubstance> milk fat milk <containsSubstance> milk protein milk <containsSubstance> lactose cows <hasComponent> cow milk goats <hasComponent> goat milk ewes <hasComponent> ewe milk goat milk <containsSubstance> goat cheese ewe milk <containsSubstance> ewe cheese acid soils <isa> chemical soil types acrisols <isa> genetic soil types alkaline soils <isa> chemical soil types aluvial soils <isa> lithological soil types chemical soil type <isa> soil types NT cow milk NT goat milk NT buffalo milk NT milk fat RT milk protein RT lactose RT cow milk RT goat milk RT ewe milk RT goat cheese RT ewe cheese BT chemical soil types BT genetic soil types BT chemical soil types BT lithological soil types BT soil types BT Asteraceae BT Cichorium BT Cichorium RT coffee substitutes RT root vegetables NT blood protein NT blood lipids blood <containsSubstance> blood protein blood <containsSubstance> blood lipids 23 The rules as you go approach Exploit patterns to automate the conversion process 1. Editor: Cichorium BT Asteraceae Cichorium <isa> Asteraceae 2. Pattern: Taxon BT Taxon Taxon <isa> Taxon 3. Therefore: Cichorium endivia BT Cichorium Cichorium endivia <isa> Cichorium Result: 24 Undifferentiated relationships from AGROVOC Edited relationships milk milk milk milk milk milk cow goats ewes goat milk ewe milk acid soils acrisols alkaline soils aluvial soils chemical soil types Cichorium Cichorium endivia Cichorium intybus Cichorium intybus Cichorium intybus blood blood milk <includesSpecific> cow milk milk <includesSpecific> goat milk milk <includesSpecific> buffalo milk milk <containsSubstance> milk fat milk <containsSubstance> milk protein milk <containsSubstance> lactose cows <hasComponent> cow milk goats <hasComponent> goat milk ewes <hasComponent> ewe milk goat milk <containsSubstance> goat cheese ewe milk <containsSubstance> ewe cheese acid soils <isa> chemical soil types acrisols <isa> genetic soil types alkaline soils <isa> chemical soil types aluvial soils <isa> lithological soil types chemical soil type <isa> soil types Cichorium <isa> Asteraceae Cichorium endivia <isa> Cichorium Cichorium intybus <isa> Cichorium NT cow milk NT goat milk NT buffalo milk NT milk fat RT milk protein RT lactose RT cow milk RT goat milk RT ewe milk RT goat cheese RT ewe cheese BT chemical soil types BT genetic soil types BT chemical soil types BT lithological soil types BT soil types BT Asteraceae BT Cichorium BT Cichorium RT coffee substitutes RT root vegetables NT blood protein NT blood lipids blood <containsSubstance> blood protein blood <containsSubstance> blood lipids 25 The rules as you go approach Exploit patterns to automate the conversion process 1. Editor: Cichorium intybus RT coffee substitutes Cichorium intybus <usedToMake> coffee substitutes 2. Pattern: Taxon RT FoodProduct Taxon <usedToMake> FoodProduct 3. Therefore: Cichorium intybus RT root vegetables Cichorium intybus <usedToMake> root vegetables Result: 26 Undifferentiated relationships from AGROVOC Edited relationships milk milk milk milk milk milk cow goats ewes goat milk ewe milk acid soils acrisols alkaline soils aluvial soils chemical soil types Cichorium Cichorium endivia Cichorium intybus Cichorium intybus Cichorium intybus blood blood milk <includesSpecific> cow milk milk <includesSpecific> goat milk milk <includesSpecific> buffalo milk milk <containsSubstance> milk fat milk <containsSubstance> milk protein milk <containsSubstance> lactose cows <hasComponent> cow milk goats <hasComponent> goat milk ewes <hasComponent> ewe milk goat milk <containsSubstance> goat cheese ewe milk <containsSubstance> ewe cheese acid soils <isa> chemical soil types acrisols <isa> genetic soil types alkaline soils <isa> chemical soil types aluvial soils <isa> lithological soil types chemical soil type <isa> soil types Cichorium <isa> Asteraceae Cichorium endivia <isa> Cichorium Cichorium intybus <isa> Cichorium Cichorium intybus <usedToMake> coffee substitutes Cichorium intybus <usedToMake> root vegetables blood <containsSubstance> blood protein 27 blood <containsSubstance> blood lipids NT cow milk NT goat milk NT buffalo milk NT milk fat RT milk protein RT lactose RT cow milk RT goat milk RT ewe milk RT goat cheese RT ewe cheese BT chemical soil types BT genetic soil types BT chemical soil types BT lithological soil types BT soil types BT Asteraceae BT Cichorium BT Cichorium RT coffee substitutes RT root vegetables NT blood protein NT blood lipids The rules as you go approach Discussion Main idea: Formulate constraints to assist the editor • Ontology may have many relationship types, perhaps > 100 • Constraints limit the relationship types that are possible in a specific case; show the editor only these • If the constraints limit possible relationship types to 1, conversion is automatic • Constraints may depend on Thesaurus to be converted 28 Constraints Thesaurus Relationships Possible ontology relationships NT / BT <hasMember> <includesSpecific> <hasComponent> <spatiallyIncludes> etc. | | | | <memberOf> <isa> <componentOf> <spatiallyIncludedIn> RT <similarTo> <growsIn> <treatmentFor> <hasMember> <hasComponent> <madeFrom> etc. | | | | | | <similarTo> <EnvironmentForGrowing> <treatedWith> <memberOf> <componentOf> <usedToMake> 29 Constraints Thesaurus Relationships + entity types or values Possible ontology relationships milk NT * milk milk <includesSpecific> * milk Substance NT Substance Substance <containsSubstance> Substance X BT * type* X <isa> * type* Taxon BT Taxon Taxon <isa> Taxon GeogrEntity BT GeogrEntity GeogrEntity <spatiallyIncludedIn> GeogrEntity BodyPart BT BodyPart BodyPart <isComponentOf> BodyPart ChemSubstance BT ChemSubstance ChemSubstance <isa> ChemSubstance 30 Constraints Thesaurus Relationships + entity types or values Possible ontology relationships Substance RT Substance Substance <containsSubstance> Substance Substance <containedInSubstance> Substance Substance <usedToMake> Substance Substance <madeFrom> Substance LivingOrganism RT BodyPart LivingOrganism <hasComponent> BodyPart Taxon RT FoodProduct Taxon <usedToMake> FoodProduct GeogrEntity RT GeogrGrouping GeogrEntity <isMemberOf> GeogrGrouping Process RT Object Process <performedByInstrument> Object Process <affects> Object ChemSubstance RT Function ChemSubstance <usedFor> Function 31 Checking by editor • Relationship instances created by editor by selecting from a constraint-generated menu are final • Relationship instances created automatically must be presented to the editor • If the editor determines that the relationship instances are almost always correct, she checks a box accept without checking 32 Overall conversion process • One master editor must go through the file from start to finish, processing the relationship instances and creating patterns, creating new relationship types as needed • Assistant editors can apply the patterns. • In the first pass, the master editor should deal with the easy cases. • Deal with the remaining cases later. Groups of similar relationship instances can be seen more easily in a smaller set 33 Adding new relationship types and new relationship instances • AGROVOC does not contain all relationship types or relationship instances for AI applications • Need to add data. For example Organism X <hasPest> Organism Y ChemSubstance X <actsAgainst> Organism Y Organism X <actsAgainst> Organism Y Plant X <growsIn> Environment Y FoodProduct X <suitableFor> Diet Y 34 Conclusion The rules-as-you-go approach is a realistic method for developing a rich ontology from an existing thesaurus Full paper: Reengineering Thesauri for New Applications: the AGROVOC Example Journal of Digital Information, Volume 4 Issue 4 http://jodi.ecs.soton.ac.uk/Articles/v04/i04/Soergel/ 35 References • For questions and discussion contact Boris Lauser [email protected] Dagobert Soergel [email protected] • AOS: Agricultural Ontology Service Project http://www.fao.org/agris/aos • AGMES: http://www.fao.org/agris/agmes 36