Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1 March 23-24th 2002 Philippe Rocca-Serra Microarray Informatics Team EBI-EMBL, Hinxton Cambridge The European Bioinformatics Institute ArrayExpress: a database for Gene Expression Studies Samples Gene expression data matrix Genes The European Bioinformatics Institute ArrayExpress goals To create a public repository for gene expression data: apply a standard format apply curation to the data (high quality control) easy access to information search and retrieve information To compare experiments. To perform analysis and data mining using complex querying The European Bioinformatics Institute What kind of data should be stored ? Samples annotations Experiment (platform, Genes & transcription units conditions…) Gene expression data matrix The European Bioinformatics Institute Important issues about data annotation Sufficient annotation of the experiment, genes and samples Efficient annotation: •Machine processable: effective mining agents •Homogenous: consistent annotation •Unambiguous: accurate description, sample discrimination. The European Bioinformatics Institute MIAME Requirements: addressing the issue of sufficient annotation Recorded info should be sufficient to interpret and replicate the experiment Experimental design: the set of hybridisation experiments as a whole Array design: each array used and each element (spot) on the array Samples: samples used, extract preparation and labelling Hybridisations: procedures and parameters Measurements: images, quantitation, specifications Normalisation controls: types, values, specifications (Brazma et al, Nature Genetics, 2001) The European Bioinformatics Institute Second Challenge Addressing the issue of annotation efficiency One of the main MGED Goal to facilitate the adoption of standards for DNA-array experiment annotation and data representation requires machine understandable annotations: – Avoid free text and natural language: – Avoid synonyms: adrenaline / epinephrine – General use of CV and Ontologies Gene annotation using e.g. GO and pathway analysis Create a new ontology where necessary: – Task assigned to MGED for Biomaterial (sample) description The European Bioinformatics Institute Ontology integration in the object model describing ArrayExpress database ArrayExpress DB is an implementation of the MAGE-OM model (a UML model) MAGE model by construction includes the use of ontology entries : -37 locations for an “Ontology Entry” -36 cases of simple Controlled Vocabularies: e.g. Image Format (TIFF, JPEG) -1 has required development of specific modelling: Biomaterial (sample) description The European Bioinformatics Institute MAGE BioMaterial Model The European Bioinformatics Institute Facts about MGED biomaterial ontology Authors: Developed by Chris Stoeckert, U. Penn and Helen Parkinson, EBI Coordinated with the ArrayExpress database model (mapping available) Technical choices: Use of the OIL Language –A new standard for building ontologies provides support for Formal Semantics and Reasoning: –Class/property modelling primitives based on Frame based systems: –Semantics Capturing based on Description Logics: –Syntax for encoding primitives and semantics based on existing Web languages: XML Availability: http://mged.sourceforge.net/Ontologies.shtml The European Bioinformatics Institute MGED ontology:features & complexity Facts about the ontology: – 75 classes – 70 slots – 98 individuals – more individuals to be added The European Bioinformatics Institute Using MGED Ontology: a Browseable Form The European Bioinformatics Institute MGED defined concepts: internal terms The European Bioinformatics Institute Linking to external ontologies: an application The European Bioinformatics Institute External References MGED Ontology Instances ©-BioMaterialDescription ©-Biosource Property ©-Organism NCBI Taxonomy Mus musculus musculus id: 39442 7 weeks after birth ©-Age Mouse Anatomical Dictionary ©-DevelopmentStage Stage 28 Female ©-Sex International Committee on Standardized Genetic Nomenclature for Mice ©-StrainOrLine C57BL/6 Charles River, Japan ©-BiosourceProvider ©-OrganismPart Mouse Anatomical Dictionary Liver ©-BioMaterialManipulation ©-EnvironmentalHistory ©-CultureCondition ©-Temperature 22 2C ©-Humidity 55 5% ©-Light 12 hours light/dark cycle ©-PathogenTests Specified pathogen free conditions ©-Water ad libitum ©-Nutrients MF, Oriental Yeast, Tokyo, Japan ©-Treatment ©-CompoundBasedTreatment ChemIDplus (Compound) (Treatment_application) (Measurement) Fenofibrate, CAS 49562-28-9 in vivo, oral gavage 100mg/kg body weight The European Bioinformatics Institute Referencing to external ontologies NCBI taxonomy database Jackson Lab mouse strains and genes Edinburgh mouse atlas anatomy GO Gene Ontology HUGO nomenclature for Human genes Chemical and compound Ontologies - Merck index TAIR Flybase …..and many more…www.mged.org/ontology/ The European Bioinformatics Institute Planning MGED ontology’s future Making the ontology available where it’s needed: Develop browser or other interface for the ontology and link to LIMS Incorporate the ontology into submission/annotation and curation tools (MIAMExpress) The European Bioinformatics Institute Planning MGED ontology’s future External Ontologies Other submitters Submission via MIAMExpress Large centres LIMS MGED/ArrayExpress ontology Direct Submission in Mage-ML Ontology availability made simple ? Curation DB ArrayExpress DB The European Bioinformatics Institute Planning MGED ontology’s future Making the ontology available where it’s needed: Develop browser or other interface for the ontology and link to LIMS Incorporate the ontology into submission/annotation and curation tools (MIAMExpress) Further ontology development : new instances, class refinement Better integration of available ontologies Writing guidelines on how to use ontologies for annotating data: Developing Use cases (non trivial task) The European Bioinformatics Institute Resources List of ontology resources from MGED pages MAGE-MIAME-ontology mappings, MIAME glossary Schemas for both ArrayExpress and MIAMExpress Annotation examples in MAGE-ML URL: www.mged.org ¦ www.ebi.ac.uk/microarray mailing lists: [email protected] [email protected] The European Bioinformatics Institute Acknowledgements EBI-EMBL: University of Pennsylvania: H. Parkinson C. Stoeckert S. Sansone E. Holloway A. Brazma And the Microarray Informatics Team. The European Bioinformatics Institute