Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Copyright © 1997 Pangea Systems, Inc. All rights reserved. Building Ontologies Building Ontologies Copyright © 1998 Pangea Systems, Inc. All rights reserved. No field of Ontological Engineering equivalent to Knowledge or Software Engineering; No standard methodologies for building ontologies; Such a methodology would include: a set of stages that occur when building ontologies; guidelines and principles to assist in the different stages; an ontology life-cycle which indicates the relationships among stages. Gruber's guidelines for constructing ontologies are well known. The Development Lifecycle Two kinds of complementary methodologies emerged: Stage-based, e.g. TOVE [Uschold96] Iterative evolving prototypes, e.g. MethOntology [Gomez Perez94]. Most have TWO stages: 1. Informal stage Copyright © 1998 Pangea Systems, Inc. All rights reserved. 2. Formal stage ontology is sketched out using either natural language descriptions or some diagram technique ontology is encoded in a formal knowledge representation language, that is machine computable An ontology should ideally be communicated to people and unambiguously interpreted by software the informal representation helps the former the formal representation helps the latter. A Provisional Methodology skeletal methodology and life-cycle for building ontologies; Inspired by the software engineering V-process model; Copyright © 1998 Pangea Systems, Inc. All rights reserved. A The left side charts the processes in building an ontology The The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology overall process moves through a life-cycle. The V-model Methodology Identify purpose and scope Knowledge acquisition Ontology in Use Evaluation: coverage, verification, granularity User Model Copyright © 1998 Pangea Systems, Inc. All rights reserved. Conceptualisation Integrating existing ontologies Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency Conceptualisation Model Encoding Representation Implementation Model Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation The ontology building life-cycle Identify purpose and scope Knowledge acquisition Building Copyright © 1998 Pangea Systems, Inc. All rights reserved. Conceptualisation Encoding Evaluation Integrating existing ontologies Language and representation Available development tools User Model: Identify purpose and scope what applications the ontology will support EcoCyc: Pathway engineering, qualitative simulation of metabolism, computer-aided instruction, reference source TAMBIS: retrieval across a broad range of bioinformatics resources Copyright © 1998 Pangea Systems, Inc. All rights reserved. Decide The use to which an ontology is put affects its content and style Impacts re-usability of the ontology User Model: Knowledge Acquisition biologists; standard text books; research papers and other ontologies and database schema. Motivating scenarios and informal competency questions – informal questions the ontology must be able to answer Copyright © 1998 Pangea Systems, Inc. All rights reserved. Specialist Evaluation: Fitness for purpose Coverage and competency Conceptualisation Model: Conceptualisation Copyright © 1998 Pangea Systems, Inc. All rights reserved. Identify the key concepts, their properties and the relationships that hold between them; Which ones are essential? What information will be required by the applications? Structure domain knowledge into explicit conceptual models. Identify natural language terms to refer to such concepts, relations and attributes; Determine naming conventions Consistent naming for classes and slots EcoCyc: Classes are capitalized, hyphenated, plural Slot names are uppercase A quality ontology captures relevant biological distinctions with high fidelity Conceptualisation Model: Pitfalls Pitfall: Missing ontological elements Missing classes: Swiss-Prot Protein complexes Missing attributes: Genetic code identifier Confuse 1:1 with 1:Many, or 1:Many with Many:Many Copyright © 1998 Pangea Systems, Inc. All rights reserved. Cofactor as an attribute of reaction Important data is stored within text/comment fields Pitfall: Extra ontological elements Pitfall: Stop over-elaborating – when do I stop? Pitfall: Relevance – do I really need all this detail? Integrating Existing Ontologies or adapt existing ontologies when possible Save time Correctness Facilitate interoperation Copyright © 1998 Pangea Systems, Inc. All rights reserved. Reuse Integration of ontologies Ontologies have to be aligned Hindered by poor documentation and argumentation Hindered by implicit assumptions Shared generic upper level ontologies should make integration easier Encoding: Implementation Toolkit Construct ontology using an ontology-development system Does the data model have the right expressivity? Copyright © 1998 Pangea Systems, Inc. All rights reserved. Is it just a taxonomy or are relationships needed? Is multiple parentage needed? Inverse relationships? What types of constraints are needed? Are reasoning services needed? What are authoring features of the development tool? Can ontology be exported to a DBMS schema? Can ontology be exported to an ontology exchange language? Is simultaneous updating by multiple authors needed? Size limitations of development tool? Encoding: Ontology Implementation Pitfalls Pitfall: Semantic ambiguity Multiple ways to encode the same information Meaning of class definitions unclear Encoding Bias Encoding the ontology changes the ontology Copyright © 1998 Pangea Systems, Inc. All rights reserved. Pitfall: Encoding: Ontology Implementation Pitfalls Pitfall: Redundancy (lack of normalization) Exact same information repeated Presence of computationally derivable information More effort required for entry and update Partial updates lead to inconsistency OK if redundant information is maintained automatically Copyright © 1998 Pangea Systems, Inc. All rights reserved. Date of birth and age DNA sequence and reverse complement Encoding: The Interaction Problem Copyright © 1998 Pangea Systems, Inc. All rights reserved. Task influences what knowledge is represented and how its represented Molecular biology: chemical and physical properties of proteins Bioinformatics: accession number, function gene Underlying perspectives mean they may not be reconcilable If an ontology has too many conflicting tasks it can end up compromised – TaO experience Evaluate it - A guide for reusability Conciseness No redundancy Appropriateness – protein molecules at the atomic resolution when amino acid level would do Clarity Consistency Satisfiability – it doesn’t contradict itself Enzyme is a both a protein which catalyses a reaction and does not catalyse a reaction Commitment Do I have to buy into a load of stuff I don’t really need or want just to get the bit I do? Copyright © 1998 Pangea Systems, Inc. All rights reserved. Documentation: Make Ontology Understandable! clear informal and formal documentation An ontology that cannot be understood will not be reused Genbank feature table NCBI ASN.1 definitions Copyright © 1998 Pangea Systems, Inc. All rights reserved. Produce There exists a space of alternative ontology design decisions Semantics / Granularity Terminology Pitfall: Neglecting to record design rationale Publish the Ontology and informal specifications Intended domain of application Design rationale Limitations Copyright © 1998 Pangea Systems, Inc. All rights reserved. Formal See EcoCyc paper in ISMB-93/Bioinformatics 00 See TAMBIS paper in Bioinformatics 99 Macromolecule Reference Ontology MacroMolecule SequenceComponent Gene Lipid Copyright © 1998 Pangea Systems, Inc. All rights reserved. Nucleic Acid RNA Protein Motif Phosphorylation site Peptide Enzyme Restriction site mRNA cDNA DNA gDNA mDNA componentOf Discussion Copyright © 1998 Pangea Systems, Inc. All rights reserved. What is a macromolecule? Where does macromolecule fit into an upper level ontology? Substance? Structure? Is lipid a macromolecule? If we replace macromolecule with biopolymer is the placement of lipid legit? Is a peptide a protein and therefore a macromolecule? If not, where does it go? Taxonomy and Roles Copyright © 1998 Pangea Systems, Inc. All rights reserved. Do we want to assert everything in a taxonomy? Or do we want to define things in terms of their properties? Enzyme = Protein catalyses Reaction gDNA = DNA hasLocation Chromosomal Sufficiency as well as necessary conditions Whats the relationship between cDNA and EST cDNA and some child of RNA ? Axioms and constraints Copyright © 1998 Pangea Systems, Inc. All rights reserved. Not all RNA is translated to protein Do we want to say that DNA is translated to protein? Do we want to model catalytic RNAs? Relationships – what other ones do we need? Genes express proteins Genes express rRNA, tRNA Genes are found on gDNA Genes are found on mDNA Genes have their own components – recursive relationships with partitive semantics Reasoning? Instances? Reusable? Clear? Concise? Ontological Pitfalls – when do I stop over elaborating? Proteins amino acid residues side chains physical chemical properties …. Relevance Do we need to mention all the types of nucleic acid? Copyright © 1998 Pangea Systems, Inc. All rights reserved. Stop-over EcoCyc Chemicals MacroMolecule Compounds-And-Elements Nucleic-Acids Compounds Copyright © 1998 Pangea Systems, Inc. All rights reserved. Proteins Lipids RNA Misc-RNA DNA PolyPeptides Protein-Complexes DNA-Segments Genes Macromolecule in other Ontologies Gene Ontology to add attributes to gene instances in databases Doesn’t need to talk about molecules or components of molecules Copyright © 1998 Pangea Systems, Inc. All rights reserved. Used TAMBIS Ontology Models it in a similar way to our reference macromolecule ontology Because it asks questions of bioinformatics sources