Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Formal Modelling Alan Rector O p e n G A LE N BioHealth Informatics Group 1 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 When to use a classifier 1. At author time: As a compiler – Pre-coordination ► Ontologies will be delivered as “pre-coordinated” ontologies to be used without a reasoner ► To make extensions and additions quick, easy, and responsive, distri but developments, empower users to make changes ► Part of an ontology life cycle 2. At delivery time: As a service: - Post-coordination ► Many fixed ontologies are too big and too small ► Too big to find things; too small to contain what you need ► Create them on the fly ► Part of an ontology service 3. At application time: as a reasoner - Postcoordination & inference ► Decision support, query optimisation, schema integration, …, …, … O p e n G A LE N ► Part of a reasoning service BioHealth Informatics Group 2 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 When to use a classifier 1: Pre-coordinated delivery classifier as compiler ► The life cycle ► Gather requirements, sketch, experiment ► Establish patterns – design a “language” ► Criteria for success: What a subject domain expert can learn in a few days ► Bulk authoring ► Classification Development & evolution ► Quality assurance ► Commit classifier results to a pre-coordinated ontology & deliver ► Polyhierarchies (Protégé, DAG-Edit, OWL-Lite, RDF(S), Topic Maps, … ► Query and use with you favourite tool O p e n G A LE N BioHealth Informatics Group 3 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Commit Results to a PreCoordinated Ontology Assert (“Commit”) changes inferred by classifier O p e n G A LE N BioHealth Informatics Group 4 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 When to use a classifier 2: Post Coordination ► When the ontology too big – “Lazy classification” on demand API kernel model Externally available resource O p e n G A LE N BioHealth Informatics Group Big on the outside: small kernel on the inside 5 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Often combined with other services: Example - the GALEN Server Client Ontology Services Multilingual Module Multilingual Lexicons Service API Client Applications Users Ontology Module Ontologies & Classifier External Resources & “Coding” Module Resource References Run time classifier O p e n G A LE N BioHealth Informatics Group 6 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 The problem Knowledge is Big, Fractal & Changeable! But we want to ► Build ontologies cooperatively with different groups ► Extend ontologies smoothly ► Re-use pieces of ontologies ► Build new ontologies on top of old ► Quit starting from scratch O p e n G A LE N BioHealth Informatics Group 7 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Assertion: The arrival of logic-based ontologies/OWL gives new opportunities to make ontologies more manable and modular ► Let the ontology authors ► create discrete modules ► describe the links between modules ► Let the logic reasoner ► Organise the result O p e n G A LE N BioHealth Informatics Group 8 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Logic-based Ontologies: Conceptual Lego gene hand protein extremity cell body expression chronic Lung inflammation infection acute abnormal normal bacterial deletion polymorphism ischaemic O p e n G A LE N BioHealth Informatics Group 9 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Logic-based Ontologies: Conceptual Lego “SNPolymorphism of CFTRGene causing Defect in MembraneTransport of Chloride Ion causing Increase in Viscosity of Mucus in CysticFibrosis…” “Hand which is anatomically normal” O p e n G A LE N BioHealth Informatics Group 10 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Species Genes Protein Function Gene in humans Disease Protein coded by gene in humans O p e n G A LE N BioHealth Informatics Group Logical Constructs build complex concepts from modularised primitives 11 Function of Protein coded by gene in humans Disease caused by abnormality in Function of Protein coded by gene in humans Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Normalising (untangling) Ontologies Structure Structure Part-whole Function Function Part-whole O p e n G A LE N BioHealth Informatics Group 12 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Rationale for Normalisation ► Explicit distinctions between modules ► Primitives are opaque to the reasoner ►Information implicit in primitive names cannot contribute to modularisation ► Primitives are indivisible to both human and reasoner ►Each primitive should represent a single notion ► Therefore, each primitive must belong to exactly one module ►If a primitive belongs to two modules, they are not modular. ►If a primitive belongs to two modules, it probably conflates two notions ► Therefore concentrate on the “primitive skeleton” of the domain ontology O p e n G A LE N BioHealth Informatics Group 13 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Normalisation and Untangling Let the reasoner do multiple classification ► Tree ► Everything has just one parent ►A ‘strict hierarchy’ ► Directed Acyclic Graph (DAG) ► Things can have multiple parents ►A ‘Polyhierarchy’ ► Normalisation ► Separate primitives into disjoint trees ► Link the trees with restrictions ►Fill in the values O p e n G A LE N BioHealth Informatics Group 14 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Tables are easier to manage than DAGs / Polyhierarchies …and get the benefit of inference: Grass and Leafy_plants are both kinds of Plant O p e n G A LE N BioHealth Informatics Group 15 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Remember to add any closure axioms Closure Axiom Then let the reasoner do the work O p e n G A LE N BioHealth Informatics Group 16 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Normalisation: From Trees to DAGs Before classification A tree After classification A DAG Directed Acyclic Graph O p e n G A LE N BioHealth Informatics Group 17 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Normalisation: Criterion 1 The skeleton should consist of disjoint trees ► Every primitive concept should have exactly one primitive parent ► All multiple hierarchies the result of inference by reasoner O p e n G A LE N BioHealth Informatics Group 18 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Normalisation Criterion 2: No hidden changes of meaning ►Each branch should be homogeneous and logical (“Aristotelian”) ► Hierarchical principle should be subsumption ►Otherwise we are “lying to the logic” ► The criteria for differentiation should follow consistent principles in each branch eg. structure XOR function XOR cause O p e n G A LE N BioHealth Informatics Group 19 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 A Non-homogeneous taxonomy “On those remote pages it is written that animals are divided into: O p e n G A LE N a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigs e. mermaids f. fabulous ones g. stray dogs h. those that are included in this classification i. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance" From The Celestial Emporium of Benevolent Knowledge, Borges BioHealth Informatics Group 20 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Normalisation Criterion 3 Distinguish “Self-standing” and “Refining” Concepts ►Self-standing concepts ► Roughly Welty & Guarino’s “sortals” ►person, idea, plant, committee, belief,… ► Refining concepts – depend on self-standing concepts ►mild|moderate|severe, hot|cold, left|right,… ► Roughly Welty & Guarino’s non-sortals ► Closely related to Smith’s “fiat partitions” ► Usefully thought of as Value Types by engineers ► For us an engineering distinction… O p e n G A LE N BioHealth Informatics Group 21 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Normalisation Criterion 3a Self-standing primitives should be globally disjoint & open ► Primitives are atomic ► If primitives overlap, the overlap conceals implicit information ► A list of self-standing primitives can never be guaranteed complete ► How many kinds of person? of plant? of committee? of belief? ► Can’t infer: Parent & ¬sub1 &…& ¬subn-1 subn ► Heuristic: ► Diagnosis by exclusion about self-standing concepts should NOT be part of ‘standard’ ontological reasoning O p e n G A LE N BioHealth Informatics Group 22 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Normalisation Criterion 3b Refining primitives should be locally disjoint & closed ► Individual values must be disjoint ► but can be hierarchical ►e.g. “very hot”, “moderately severe” ► Each list can be guaranteed to be complete ► Can infer Parent & ¬sub1 &…& ¬subn-1 subn ► Value types themselves need not be disjoint ► “being hot” is not disjoint from “being severe” ►Allowing Valuetypes to overlap is a useful trick, e.g. ►restriction has_state someValuesFrom (severe and hot) O p e n G A LE N BioHealth Informatics Group 23 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Normalisation Criterion 4 Axioms ► No axiom should denormalise the ontology No axiom should imply that a primitive is part of more than one branch of primitive skeleton ► If all primitives are disjoint, any such axioms will make that primitive unsatisfiable ►A partial test for normalisation: ►Create random conjunctions of primitives which do not subsume each other. ►If any are satisfiable, the ontology is not normalised O p e n G A LE N BioHealth Informatics Group 24 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Consequences 1 ► All self-standing primitives are disjoint ► All multiple classification is inferred ► For any two primitive self-standing classes, either one subsumes the other or they are disjoint ► Every self standing concept is part of exactly one primitive branch of the skeleton ► Every self-standing concept has exactly one most specific primitive ancestor O p e n G A LE N BioHealth Informatics Group 25 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Consequences 2 ► Primitives introduced by a conjunction of one class and a boolean combination of zero or more restrictions ► Tree subclass-of Plant and restriction isMadeOf someValuesFrom Wood ► Resort subclass-of Accommodation restriction isIntendedFor someValueFrom Holidays O p e n G A LE N BioHealth Informatics Group 26 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Consequences 3 ►Use of axioms limited (outside of skeleton construction) The following are a safe but not exhaustive guide: ►The right side of subclass axioms limited to restrictions ►Both sides of disjointness axioms limited to restrictions ►No equivalence axioms with primitives on either side O p e n G A LE N BioHealth Informatics Group 27 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 A real example: Build a simple treee easy to maintain O p e n G A LE N BioHealth Informatics Group 28 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Let the classifier organise it O p e n G A LE N BioHealth Informatics Group 29 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 If you want more abstractions, just add new definitions (re-use existing data) O p e n G A LE N BioHealth Informatics Group “Diseases linked to abnormal proteins” 30 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 And let the classifier work again O p e n G A LE N BioHealth Informatics Group 31 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 And again – even for a quite different category “Diseases linked genes described in the mouse” O p e n G A LE N BioHealth Informatics Group 32 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 And let classifier check consistency (My first try wasn’t) O p e n G A LE N BioHealth Informatics Group 33 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005 Summary Why use a Classifier? ► To compose concepts ► Allow conceptual lego ► To manage polyhierarchies ► Adding abstractions (“axes”) as needed ► Normalisation ► Untangling ► labelling of “kinds of is-a” ► To avoid combinatorial explosions ► Keep bicycles from exploding ► To manage context ► Cross species, Cross disciplines, Cross studies ► To check consistency and help users find errors O p e n G A LE N BioHealth Informatics Group 34 Ontology tutorial, © 2005 Univ. of Manchester Feb 2005