Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RDA Wheat Data Interoperability Cookbook and last developments 9th March 2015, San Diego The WDI working group in brief Endorsed by RDA in March 2014 Members: ~=30 members and 15 active members, Wheat scientists, data and metadata technologists The goal: contribute to the improvement of Wheat related data interoperability by Building a common interoperability framework (metadata, data formats and vocabularies) Providing guidelines for describing, representing and linking Wheat related data 2 Initial plans 3 Deliverables A report of the survey of existing standards A cookbook intended for the Wheat data managers community, which provides them with guidelines on what data formats, metadata, vocabularies and ontologies they should use to describe, represent and link different types of Wheat data. A library of linked vocabularies and ontologies in machine readable formats with respect to the Linked Data standards. A prototype which showcases the gain of interoperability Where we are 4 • Landscape of Wheat related standards and their use by the community • Comprehensive overview of Wheat related ontologies and vocabularies Surveys Workshops • Recommendations • Mappings between different data formats • Actions to conduct in order to improve the current level of Wheat related data interoperability • Interoperability use cases • Interactive cookbook: recommendations + guidelines • A repository of Wheat related linked vocabularies (Bioportal) Implementation Wheat related standards survey and workshop Data type Data formats currently used Recommendations 6 Standardized Tool specific SNPs VCF BAM/SAM, BED, VARSCAN, VEP genome annotations Genbank Flat File, General Feature Format (GFF), EMBL Germplasms MPCD, ABCD, Darwin Core, Darwin Core Germplasm Gene expression Many format standards laid out by repositories such as NCBI (GEO) and EBI Array Express Physical maps GFF Non standardized VCF files generated by using the survey sequences of IWGSC + metadata about VCF files to enrich the information about the SNPs. GFF 3 + specifications with regard the description of specific columns Grin Global tabulated MPCD Existing format standards laid out by the repositories such as NCBI (GEO) and EBI Array Express + ENA Cmap, fpc GFF3 Genetic maps Cmap, gnpmap GFF3 (to be confirmed) Phenotypes Drops, ped, isatab, ephesis tabulated Isa-tab Examples of use cases 7 Title Searching for germplasm with specific traits Description Example of searching for germplasm with specific traits - tagged with ontology terms? Data types Germplasm Phenotype Challenges ● ● ● ● ● Title Identification of wheat genes that control root growth Description Requires: Annotated genes (Gene Ontology, PFam, and other functional annotation) Data types Genomic annotations? - Gene location ? (IWGS-SS ID or MIPS HCS link) Challenges Mapping between wheat genes and orthologs from other species (deduce function by seq. similarity); Access to RNASeq data (genes that are not expressed in roots may be irrelevant) ; mapping of wheat genes and information on their function based on literature Title Query on trial data associated with varieties Data types Phenotypic data, GIS data, (wheat economy/production data) Description To search wheat varieties with distribution maps, production figures, performances in wheat mega environments, associated projects worldwide plus layers of climatic data on specific wheat production areas and disease prevention information. Challenges Phenotypic data should be linked to GIS data. Using keywords or ontology terms a system or a tool should be able to pull out such information from different websites/systems developed by wheat community. Metadata very important ~ standardized format Association of genes to traits, linked to germplasm, marker information Need for quality controls- how confident are you of the data source? Provenance of the germplasm- pedigree, ownership, Standard system for tracking germplasm, names 8 Wheat related ontologies and vocabularies survey The objectives of the survey Assess the level of visibility and interoperability of Wheat related vocabularies and ontologies Is the vocabulary/ontology updated regularly? What license and/or copyright is used? Is the vocabulary/ontology part of any ontology communities or listing services? Is the vocabulary/ontology used or implemented in any database/repository? Does the vocabulary/ontology interlink and/or map to other vocabularies and ontologies? Does the vocabulary/ontology Identify the domain covered by the ontologies and vocabularies Refine the cookbook Collect more interoperability use cases Collect some technical details 10 The objectives of the survey What level of visibility/operability? What content? What formats, and technologies? Guidelines and Repository 11 The Wheat related BioPortal allows one to search for terms across multiple ontologies, browse mappings between terms in different ontologies, receive recommendations on which ontologies are most relevant for a corpus, annotate text with terms from ontologies Next steps Metadata (harmonization, minimal metadata sets) Mappings Next workshop (summer 2015) Review and complete the recommendations Refine and complete the guidelines and the best practices Finalize the repository of Wheat related vocabularies Implement the prototype 13 Thanks! 14