Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GO terms implicitly refer to other term • • • • • • • • • • cysteine biosynthesis myoblast fusion hydrogen ion transporter activity snoRNA catabolism wing disc pattern formation epidermal cell differentiation regulation of flower development interleukin-18 receptor complex B-cell differentiation dorsal ectoderm biosynthesis is_a metabolism cysteine is_a serine family amino acid is_a amino acid is_a amine cysteine is_a serine family amino acid is_a amino acid is_a serine Composed terms currently cause problems – – – – – – No link to external ontology term Redundancy Inconsistency Extra work Annotation bottleneck Tangled DAGs and confusing displays • we have no way to disentangle • Solution so far: – fix errors based on results of term name parsing (Obol) • reactive, not proactive Solution: actively manage composed terms • Explicit pre-coordination – Composed terms should now/soon be coordinated using oboedit plugin • building block terms are recorded in ontology along with composite term • Benefits: – Correct DAG structure can be inferred from external ontologies • e.g. make sure GO + CHEBI “align” – placement & consistency checking automated – additional work can be automated • synonyms, text definitions How will terms be precoordinated by oboedit? • How do we record a definition for a composite term? – using a logical definition (computational essence) • A logical definition consists of: – a generic term (aka genus) – relationships to other terms which serve to discriminate this specific term from other is_a children of the generic term (aka differentiae) • Can be written in natural language as: – A <generic term> which <discriminating characteristics> Example of pre-coordination • cysteine biosynthesis • generic term: – biosynthesis • discriminating characteristics: – outputs cysteine – natural language (Aristotelian style): • a biosynthesis process which outputs cysteine Example in Obo format [Term] id: GO:0019344 name: cysteine biosynthesis intersection_of: GO:0009058 ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine is_a: GO:0009070 ! serine family amino acid biosynthesis is_a: GO:0006534 ! cysteine metabolism Alternate syntax GO:cysteine_biosynthesis == GO:biosynthesis ∏ outputs(CHEBI:cysteine) • • • • used in pheno-syntax more compact similar to OWL abstract syntax I use Obo1.2 format or natural language in the rest of this presentation This allows us to dynamically untangle • Process axis view (primary is_as, via generic term): – biological_process • metabolism – biosynthesis » cysteine biosynthesis • Process participant axis view: – amine • amino acid – serine family amino acid » cysteine • Combined view – (same as current tangled diamond lattice) Obol demo • http://yuri.lbl.gov/amigo/obol Recording the relationship is important • Why not just a simple cross-product? – e.g. biosynthesis x cysteine • Relationships are important for reasoning and querying – Consider: • cysteine biosynthesis from serine • mRNA export from nucleus during heat stress • Without the relations, the logical definition is not specific enough – the essence is not captured • Relations should come from RO – more required Multiple discriminating characteristics are allowed • Cysteine biosynthesis from serine – Generic term: • biosynthesis – Discriminating characteristics: • output cysteine • input serine [Term] name: cysteine biosynthesis from serine intersection_of: GO:0009058 ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine intersection_of: input CHEBI:17822 ! serine Composite terms can be nested [Term] id: GO:xxxxxxx name: regulation of cysteine biosynthesis intersection_of: GO:0050789 ! regulation of biological process intersection_of: regulates GO:0019344 ! cysteine biosynthesis [Term] id: GO:0019344 name: cysteine biosynthesis intersection_of: GO:0009058 ! biosynthesis intersection_of: outputs CHEBI:15356 ! cysteine YES regulation^regulates(biosynthesis^outputs(cysteine)) regulation^regulates(biosynthesis)^outputs(cysteine) NO Composite terms can optionally be manufactured in bulk • Generic term: {metabolism,biosynthesis} • Differentia: has_output {serine, cysteine, …} • With caution… – Sparse vs dense matrices – not all combinations are types On the importance of necessary and sufficient conditions • Why intersection_of? • Why not just make normal links in the GO DAG? – normal relationships are for necessary conditions only – we want both necessary and sufficient conditions • captures the essence of the term Normal DAG links only capture necessary conditions, not essence immune cell activation is_a text def: A change in morphology and behavior of a macrophage resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor macrophage activation inflammatory response part_of Indistinguishable by DAG immune cell activation is_a text def: A change in morphology and behavior of a monocyte resulting from exposure to a cytokine, chemokine, cellular ligand, pathogen, or soluble factor monocyte activation inflammatory response part_of essence captured by genusdifferentia immune cell activation inflammatory response is_a macrophage activation part_of id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage essence captured by genusdifferentia cell activation is_a genus immune cell activation inflammatory response is_a part_of CL:macrophage activates macrophage activation id: GO:macrophage_activation intersection_of: GO:cell_activation intersection_of: activates CL:macrophage Current status of precoordinated terms • SO already contains composite terms – 46 pre-coordinated terms – A silenced gene is a gene which has the quality of being silenced • GO-BP/CL integration underway – retrospectively pre-coordinated terms • Obol page has pre-coordinated terms from automatic parsing – http://www.fruitfly.org/~cjm/obol Pre- vs post- coordinated • Pre-coordination – terms are in ontology with IDs and computable definitions – increases complexity of ontology – complexity can be managed by tools • e.g. new oboedit features • Post-coordination – terms are combined in the database – forces more complexity in database schema and database applications Pre-coordination is useful in moderation • Commonly used terms should be precoordinated • eg cysteine biosynthesis; oocyte differentiation; pectoral fin • Avoid taking to extremes • cf ICD-9 • Where do we draw the line? – ontologies should be built around one or a few axes of classification • term ‘explosion’ typically gets large when multiple axes are combined – we can change our minds later • pre- and post- coordination is commensurable Commensurability • Annotator annotates to – nucleus^part_of(astrocyte) • Anatomy editor creates new term – uses oboedit cross-product plugin – astrocyte_nucleus = nucleus^part_of(astrocyte) • Annotation can be dynamically ‘promoted’ to new term in answer to queries – various software techniques for achieving this Post-coordination in GO annotations • Pre- and post- coordination are compatible and commensurable • We should extend the annotation format to allow denoting more specific classes – e.g. • cholesterol transport in liver – advanced applications can query this – standard applications suffer no loss – extended annotations can be used to help seed new terms in the ontology • This is already being done (MGI,Dicty) – we just want to capture this in interopeable way Post-composition in gene association files • New column in GA file format … Gene Product Term ID Properties AABC1 GO:0030301 (cholesterol transport) located_in(MA:liver) AABC2 GO:0048663 (neuron fate development) has_participant(FBbt:Y_neuron) Database issues • Chado and GO DB can handle pre- and post- coordination – in theory anyway • not yet fully tested • How does it work? – ‘anonymous term’ created for coordinated term – documentation in chado cvs • chado/modules/cv/doc/