* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download MGI
Gene expression programming wikipedia , lookup
Metagenomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Human genetic variation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Human genome wikipedia , lookup
Gene therapy wikipedia , lookup
Microevolution wikipedia , lookup
History of genetic engineering wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Human Genome Project wikipedia , lookup
Helitron (biology) wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genome editing wikipedia , lookup
Genome evolution wikipedia , lookup
Genome (book) wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Ontologies, Databases, Knowledgebases: How should they interoperate? Judith Blake, Ph.D. The Jackson Laboratory Thesis • The Mouse Genome Informatics (MGI) system – provides a model for interoperablity that – incorporates the use of ontologies, – depends upon the interconnection among databases, and – Supports integration of data from multiple data sources • This may provide model for PRO objectives to support connections between PRO and disease representations Mouse Genome Informatics (MGI) MGI’s primary mission is to facilitate the use of mouse as a model for human biology by providing integrated access to data on the genetics, genomics, and biology of the laboratory mouse. variants & polymorphisms expression sequence genome location gene function strain geneaology Hermansky-Pudlak syndrome Mouse model & human phenotype tumors mouse/human orthologs & maps Information content spans from sequence to phenotype/disease 3 Automated (mostly) Data Integration (Loads) EG mouse UniProt Associations DFCI DoTS NIA Unigene TreeFam Gene traps Clones RPCI MGC MGI GenBank EG chimp EG dog EG rat EG human HCOP Homologene Non-mouse SNP db GO MP Vocabularies Anatomy Interpro OMIM PIRSF Annotation RefSeq Sequences UniProt DFCIseq DoTSseq NIAseq dbSNP UniSTS microRNAs NCBI VEGA Ensembl Gene models and coordinates 4 Mouse Genome Informatics Controlled vocabularies and ontologies • • • • • • • • • GO - Gene Ontology (GO) PRO - Protein Ontology (PRO) MP - Mouse Phenotype Ontology MA - Mouse Anatomy (GXD) CL- Cell Type Ontology Mouse gene and strain nomenclature SO - Sequence Ontology RO - Relations Ontology ECO - Evidence Code ontology Integration: Controlled Vocabularies and Ontologies 5 MGI Operating Principles • Data integration is key to comprehensive access to mouse genome, functional, mouse model, and comparative data allows the data to be evaluated in new contexts – Supports robust access to comprehensive information – Permits efficient access to related resources • Standards are key to data integration – Nomenclature • Standardized gene nomenclature, keywords, etc. – Knowledge representation • Gene Ontology (GO) • Mammalian Phenotype Ontology • Integration of Multi-Source Data – Depends on consistent entity tagging – Requires improvement of data storage structures – Necessitates ontology updates for data categories and context Mouse Phenotypes and Disease Models Connects mouse and human phenotypes in studies of human disease processes Mouse Crebbptm1Sis/Crebbp+ mutants showing skeletal formation defects. Human Rubinstein-Taybi Syndrome 1 (OMIM:180849), caused by CREBBP mutation. ° mental retardation ° postnatal growth deficiency ° microcephaly ° broad thumbs & halluces ° dysmorphic facial features (beaked nose, high arched palate, characteristic grimacing) ° increased tumor risk 1 Diseases and Phenotypes • Diseases are described by signs and symptoms – – Signs – things you can measure Symptoms – things the patient notices • Signs are phenotypes • Diseases are characterized by phenotypes including the order, severity and duration with which they occur. A full model of disease takes into account dimensions of anatomy, time, severity, therapeutic responsiveness, outcomes etc. There is also a probabilistic element to an instance of the disease and a probabilistic association between phenotypic elements in one instance. • Diseases are not phenotypes ( although predisposition may be considered as such) but single phenotype diseases may be viewed as phenotypes, eg. osteoarthritis. Paul Schofield, 2013 Status of Phenotype & Disease Data May 2012 May 2013 May 2014 change this yr. 8,775 9,034 10,190 +1,156 Mutant alleles cataloged : total : in mice number of genes represented targeted alleles number of genes targeted 743,813 748,960 754,256 32,299 33,659 39,241 20,937 21,442 21,786 46,822 51,119 55,640 15,488 16,221 16,358 +5,296 +5,582 +344 +14,521 +137 Alleles w/ phenotype (MP) annotation Genotypes with MP annotation Total MP annotations 29,064 43,579 223,125 +2,530 +3,930 +19,117 Phenotype terms in MP ontology 32,095 34,625 47,790 51,720 249,46 268,577 0 Mouse genotypes modeling human disease Human Diseases w/1 mouse model(s) 3,687 1,153 4,084 1,239 4,365 1,310 +281 +71 QTLs 4,696 4,715 4,835 +120 Objective …make phenotype and disease model data robust and accessible to researchers and computational biologists • semantic consistency to enable complete data retrieval • integrated access to all phenotypic variation sources (single-gene and genomic mutations, engineered mutations, QTLs, strains) • data on human disease correlation • access to mouse models from various approaches - Genetic - Phenotypic - Genomic localization - Computational 10 Annotating Disease to Genotype • • • • Different alleles of a gene on the same background may/may not be disease models The same alleles of a gene on different genetic backgrounds may/may not be disease model Disease models are attached to genotype “objects” Disease annotation consists of OMIM term, the data reference /source, and association type OMIM term 129S1/Sv genotype Crouzon Syndrome Fgfr2tm1Schl / Fgfr2+ phenotypic similarity to human disease associated with ortholog association type Eswarakumar VP et al., PNAS USA 2006;103:18603-8 source 8 MGI 4,084 MGI Mouse Models 1,239 OMIM diseases (associated with) 12 Each associated human disease links to a Human Disease and Mouse Model Detail Page Note chicken and zebrafish 13 Biological knowledge and attributes in MGI Mouse Genome Informatics: Integrate Sequence with Biology •Nomenclature •Genome location Nucleotide Sequences Genome variation •Strains •Polymorphisms •Orthology •Expression •Alleles •Mutant phenotypes •Function of gene products •Literature Genome Features Protein Sequences Gene predictions 14 Disease Cell Anatomy Adapted from Schriml and Kibbe: ICBO submission 2013 Now with annotation extensions protein localization to nucleus[GO:003 4504] cellular response to oxidative stress [GO:0034599] positive regulation of transcription from pol II promoter in response to oxidative stress[GO:0036091] happens during sty1 has input <anonymous description> pap1 DB Object Term Ev Ref PomBase sty1 GO:0034504 IMP PMID:9585505 SPAC24B11.06c protein localization to nucleus pap1 GO:0036091 IMP PMID:9585505 PomBase SPAC1783.07c <anonymous description> has regulation target Extension .. happens_during(GO:0034599), has_input(SPAC1783.07c) has_regulation_target(…) .. Annotation Extensions MGI Modular Annotation Example – http://amigo2.berkeleybop.org Xirp1 is involved in the organization of the sarcomere in a cardiac muscle cell (CL:0000746) of the myocardilum (MA:0000080) Total number of MGI modular annotation units to proteins: 22,866 This does not include annotations to permanent cell lines Summary of MGI Modular Annotations MGI MGI MGI MGI part_of occurs_in regulates_o_occurs_in regulates_o_acts_on_population_of MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI MGI regulates_o_results_in_acquisition_of_features_of results_in_acquisition_of_features_of regulates_o_has_agent regulates_o_has_participant acts_on_population_of results_in_movement_of results_in_development_of regulates_o_results_in_movement_of results_in_specification_of results_in_maturation_of results_in_morphogenesis_of has_agent results_in_commitment_to results_in_division_of regulates_o_results_in_commitment_to results_in_determination_of regulates_o_results_in_specification_of has_output_o_axis_of regulates_o_results_in_development_of 9013 6298 2884 1017 967 723 537 275 259 232 173 156 96 61 48 34 32 21 14 13 7 5 1 Interaction Data in MGI …from catalog to context • Relationships among markers project – Explicit representation of relationships among genome features – Interaction explorer • Project initially focused on microRNAs – microRNA cluster membership – Predicted and validated microRNA targets • Curation of interaction data from the literature (Gene Ontology) and from specialized external informatics resources 20 Mouse_CCO is an application ontology built on experimental evidence-based annotations. The data drives the structure allowing a user to ‘discover’ connections. This diagram illustrates the generic template for the ontology. orthologous_to Gene_human NCBI Gene_mouse MGI described_by Cell_type CL CCO_human BioPortal participates_in Function GO (MF) participates_in Component GO (CC) Allele_mouse MGI part_of encodes Genotype_mouse MGI Protein_mouse PRO, UniProtKB Process GO (BP) associated_with has_variant located_in Mary Dolan associated_with associated_with part_of Pathway_mouse MouseCyc Phenotype_mouse MGI expressed_in Anatomy_mouse GXD, EMAP effects Disease_human OMIM, DO Mouse_CCO is populated using 1017 mouse genes annotated to GO ‘cell cycle’ along with all their annotations from MGI and several additional data resources. Here we show how the generic template is populated for Brca1. orthologous_to Gene_human BRCA1 has_variant Mouse gene: Brca1 (breast cancer 1) described_by id: CCO:B0001598 name: BRCA1_HUMAN participates_in Process DNA repair Allele: Brca1tm1Thl part_of encodes Genotype: Brca1tm1Thl/Brca1tm1Thl Waptm1(cre)Arge/0 129S1/Sv * C57BL/6J Protein_mouse VEGA model OTTMUSP00000002773 associated_with mammary adenocarcinoma Function participates_in damaged DNA binding Component located_in BRCA1-BARD1 complex associated_with Mary Dolan associated_with expressed_in TS28: mammary gland effects OMIM:114480 Breast Cancer Keys to Interoperability self-help mantras Start where you are: from silos to networks Identify shared interests: educated self promotion Develop shared processes/applications Discuss the ideal, implement the practical 23 Acknowledgements Gene Ontology Mike Cherry Suzi Lewis Paul Sternberg Paul Thomas • Mouse Genome Informatics Carol Bult Janan Eppig Jim Kadin Joel Richardson Martin Ringwald Funding: NIH_NHGRI • MGI-GO-PRO team Karen Christie Mary Dolan Harold Drabkin David Hill Li Ni Dmitry Sitnikov