* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download disease model - Buffalo Ontology Site
Survey
Document related concepts
Genetic engineering wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene therapy wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Genome (book) wikipedia , lookup
Gene expression programming wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Public health genomics wikipedia , lookup
Microevolution wikipedia , lookup
Gene nomenclature wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Designer baby wikipedia , lookup
Transcript
PATO & Phenotypes: From model organisms to clinical medicine Suzanna Lewis September 4th, 2008 Signs, Symptoms and Findings Workshop First Steps Toward an Ontology of Clinical Phenotypes Describing phenotype using ontologies will aid in the identification of models of disease & candidate causative genes GWAS: Genome Wide Association Studies Any study of genetic variation across the entire human genome that is designed to identify genetic associations with observable traits (such as blood pressure or weight), or the presence or absence of a disease or condition. Given an identified gene, then what? Animal disease models Animal models Mutant Gene Mutant or missing Protein Mutant Phenotype (disease model) Animal disease models Humans Animal models Mutant Gene Mutant Gene Mutant or missing Protein Mutant or missing Protein Mutant Phenotype (disease) Mutant Phenotype (disease model) Animal disease models Humans Animal models Mutant Gene Mutant Gene Mutant or missing Protein Mutant or missing Protein Mutant Phenotype (disease) Mutant Phenotype (disease model) Animal disease models Humans Animal models Mutant Gene Mutant Gene Mutant or missing Protein Mutant or missing Protein Mutant Phenotype (disease) Mutant Phenotype (disease model) Phenotype data mining = text searching? Text-based phenotype resources: OMIM (NCBI) DECIPHER (Sanger) HGMD (Cardiff) Disease-specific databases MODs PubMed Information retrieval from text-based resources (OMIM) is not straightforward: Query “large bone” "enlarged bone" "big bones" "huge bones" "massive bones" "hyperplastic bones" "hyperplastic bone" "bone hyperplasia" "increased bone growth" # of records 713 136 16 4 28 8 34 122 543 Thanks to: M Ashburner Even if we can find what we are looking for in one organism, how can we associate that with phenotypes observed in different organisms? Methods to link phenotypic descriptions of human diseases to animal models currently don’t exist. Goal: Turn text-based phenotypes into ontology-based computable annotations Define a model for representing phenotypes SHH-/+ SHH-/- shh-/+ shh-/- Phenotype (clinical sign) = entity + attribute Phenotype (clinical sign) = P1 = entity eye + + attribute hypoteloric Phenotype (clinical sign) = P1 P2 = = entity + eye + midface + attribute hypoteloric hypoplastic Phenotype (clinical sign) = P1 P2 P3 = = = entity eye midface kidney + + + + attribute hypoteloric hypoplastic hypertrophied Phenotype (clinical sign) = P1 P2 P3 = = = entity eye midface kidney + + + + ZFIN: eye midface kidney attribute hypoteloric hypoplastic hypertrophied PATO: hypoteloric + hypoplastic hypertrophied Phenotype (clinical sign) = entity + attribute Anatomical ontology Cell & tissue ontology Developmental ontology Gene ontology biological process molecular function cellular component + PATO (phenotype and trait ontology) Phenotype (clinical sign) = P1 P2 P3 = = = entity eye midface kidney + + + + attribute hypoteloric hypoplastic hypertrophied Syndrome = P1 + P2 + P3 (disease) (package) = holoprosencephaly Genetic Phenotype annotation model Environment Evidence Qualifier Assertion Source Entity relationship Attribution Properties Who makes the assertion When, what organization Quality Units represents subj relation OBD and annotations obj annotation Absence of aorta investigator read observation bio-entity X Dev Biol 2005 Jul 15;283(2):357-72 publish/ create Experiment/ investigation “Sonic hedgehog is required for cardiac outflow tract and neural crest cell development” Information entity communicate Direct query/ annotation meta-analysis annotation Shh- Shh bio-entity Shh+ influences Absence Of aorta Heart Participates development in Computational representation submit/ consume Agent (human/computer) Community/expert local db local db local db Multiple schemas Goal: Turn text-based phenotypes into ontology-based computable annotations Define a model for representing phenotypes Develop and extend requisite ontologies For the entities being described: anatomies, processes, … Building a suite of orthogonal interoperable reference (evidence based) ontologies in the biomedical domain. It is critical that ontologies are developed cooperatively so that their classification strategies augment one another. Truth springs from arguments amongst friends. (David Hume) RELATION TIME CONTINUANT TO INDEPENDENT OCCURRENT DEPENDENT GRANULARITY ORGAN AND ORGANISM CELL AND CELLULAR COMPONENT MOLECULE Organism Anatomical Organ (NCBI Entity Function Taxonomy?) (FMA, CARO) (FMP, CPRO) Cell (CL) Cellular Component (FMA, GO) Molecule (ChEBI, SO, RnaO, PrO) Phenotypic Quality (PaTO) Biological Process (GO) Cellular Function (GO) Molecular Function (GO) Molecular Process (GO) Requisite ontologies An ontology of qualities (PATO) Organism specific anatomies A controlled vocabulary of homologous and analogous anatomical structures (Uberon) Gene Ontology Cell Types Goal: Turn text-based phenotypes into ontology-based computable annotations Define a model for representing phenotypes Develop and extend requisite ontologies For the entities being described: anatomies, processes, … Develop an intuitive annotation environment for rigorously capturing phenotypes (“semantic authoring”) Phenote: Simple software for annotating using ontologies Provide tool for ontology-based annotation Standardized model to record annotations for increased compatibility of data between disparate communities. Simple & intuitive user interface (especially for users that don’t know/care about what an ontology is) Easy-to-configure for different user-communities Pluggable architecture for external applications to interface/embed in application Provide interfaces with external SOAP and REST services for streamlined workflow (OBD, NCBI, EBI, etc). www.phenote.org Ontologies can be utilized from various resources in OWL and OBO format BioPortal Local file External site CVS Phenote tour Editor Refining terms onthe-spot Post-composition: Join together 2 (or more) terms for specificity: Apoptosis of neuron in skin (GO,CL,FMA) S-phase of colon cancer cell (GO,CL) Aster of human spermatocyte (GO,FMA) Combine terms from different ontologies Increase “information content” of an annotation Pre-composed: Have decomposed definitions of ~2/3rds of MP terms available to incorporate mouse data Term Info Browser Annotation Table Retrieve data from NCBI: OMIM, PUBMED, … (SOAP plug-in) Graphical Viewer Goal: Turn text-based phenotypes into ontology-based computable annotations Define a model for representing phenotypes Develop and extend requisite ontologies For the entities being described: anatomies, processes, … Develop an intuitive annotation environment for rigorously capturing phenotypes (“semantic authoring”) Develop a set of guidelines for biocurators Annotate mutant phenotypes (OMIM and models) General Annotation Standards Remarkable normality Absence Relative qualities (what does “small” mean?) Rates/frequencies does it inhere in the heart or a process? Homeotic transformation Phenotypes specific to a stage or temporal duration Testing the methodology Annotated 11 gene-linked human diseases described in OMIM, and their homologs in zebrafish and fruitfly. ATP2A1, BRODY MYOPATHY EPB41, ELLIPTOCYTOSIS EXT2, MULTIPLE EXOSTOSES EYA1, EYES ABSENT FECH, PROTOPORPHYRIA PAX2, RENAL-COLOBOMA SYNDROME SHH, HOLOPROSENCEPHALY SOX9, CAMPOMELIC DYSPLASIA SOX10, PERIPHERAL DEMYELINATING NEUROPATHY TNNT2, FAMILIAL HYPERTROPHIC CARDIOMYOPATHY TTN, MUSCULAR DYSTROPHY An OMIM Record An OMIM Record An OMIM Record An OMIM Record An OMIM Record Goal: Turn text-based phenotypes into ontology-based computable annotations Define a model for representing phenotypes Develop and extend requisite ontologies For the entities being described: anatomies, processes, … Develop an intuitive annotation environment for rigorously capturing phenotypes (“semantic authoring”) Develop a set of guidelines for biocurators Annotate mutant phenotypes (OMIM and models) Collect & store annotations in a common resource (OBD) and make these broadly available 4355 genes and genotypes in OBD 17782 entity-quality annotations in OBD OBD model: Requirements Generic We can’t define a rigid schema for all of biomedicine Let the domain ontologies do the modeling of the domain Expressive Use cases vary from simple ‘tagging’ to complex descriptions of biological phenomena Formal semantics Amenable to logical reasoning First Order Logic and/or OWL1.1 Standards-compatible Integratable with semantic web OBD Model: overview Graph-based: nodes and links Nodes: Classes, instances, relations Links: Relation instances Connect subject and object via relation plus additional properties Annotations: Posited links with attribution / evidence Equivalent expressivity as RDF and OWL Links aka axioms and facts in OWL Attributed links: Named graphs Reification N-ary relation pattern Supports construction of complex descriptions through graph model OBD Dataflow QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Example of Annotation in OBD Post-composition of complex anatomical entity descriptions QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Post-composition of phenotype classes (PATO EQ formalism) QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. key OBD Architecture Two stacks 1. Semantic web stack Built using Sesame triplestore + OWLIM Future iterations: Science-commons Virtuoso 2. OBD-SQL stack Current focus Traditional enterprise architecture Plugs into Semantic Web stack via D2RQ OBD-SQL Stack Alpha version of API implemented Test clients access via SOAP Phenote current accesses via org.obo model & JDBC QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Wraps org.obo model and OBD schema Share relational abstraction layer Org.obo wraps OWLAPI Phenote currently connects via JDBC connectivity in org.obo Goal: Turn text-based phenotypes into ontology-based computable annotations Define a model for representing phenotypes Develop and extend requisite ontologies For the entities being described: anatomies, processes, … Develop an intuitive annotation environment for rigorously capturing phenotypes (“semantic authoring”) Develop a set of guidelines for biocurators Annotate mutant phenotypes (OMIM and models) Collect & store annotations in a common resource (OBD) and make these broadly available Develop tools & resources for mining data for novel discovery Developed a similarity search algorithm to identify genotypes with similar phenotype. sox9 mutations curated in PATO syntax Human, SOX9 (Campomelic dysplasia) Scapula: hypoplastic Lower jaw: decreased size Heart: malformed or edematous Phalanges: decreased length Long bones: bowed Male sex determination: disrupted Zebrafish, sox9a (jellyfish) Scapulocorocoid: aplastic Cranial cartilage: hypoplastic Heart: edematous Pectoral fin: decreased length Cartilage development: disrupted Average annotation consistency 600 # Annotations 500 400 total annotations 300 similar annotations 200 100 0 congruence 1 2 3 4 EYA1 SOX10 SOX9 PAX2 0.78 0.71 0.61 0.72 Reasoning over phenotype descriptions recorded with ontologies provides linkages in annotations. Ontologies and reasoning can reveal similarities in phenotype annotations. A zebrafish shh similar-phenotype query returns known hedgehog pathway members Gene Similarity Citation Role in hedgehog pathway smo 0.445 Ochi, et al. 2006 Membrane protein binds shh receptor ptc1 disp1 0.444 Nakano, et al. 2004 Regulates secretion of lipid modified shh from midline prdm1 a 0.43 Roy, et al., 2001 Zinc-finger domain transcription factor, downstream target of shh signaling hdac1 0.427 Cunliffe and Casaccia-Bonnefil, 2006 Transcriptional regulator required for shh mediated expression of olig2 in ventral hindbrain scube 2 0.398 Hollway et al., 2006 May act during shh signal transduction at the plasma membrane wnt11 0.380 Mullor et al., 2001 Extracellular cysteine rich glycoprotein required for gli2/3 induced mesoderm development gli2a 0.348 Kalstrom, et al., 1999 Zinc finger transcription factor target of shh signaling bmp2b 0.303 Ke et al., 2008 Downstream target of gli2 gene repression gli1 0.303 Karlstrom, et al., 2003 Zinc finger transcription factor target of shh signaling ndr2 0.289 Muller, et al., 2000 TGFbeta family member upstream of hedgehog signaling in the ventral neural tube hhip 0.265 Ochi et al., 2006 Binds shh in membrane and modulates interaction with smo OBD similarity query A computational search that enables comparison of phenotypes within and across species. Given a set of phenotype annotations recorded for a mutant allele we can identify other alleles in the same gene. We can identify other known pathway members in the same species and known gene orthologs in other species simply by comparing phenotypes alone. This annotation and search method provides a novel means for laboratory researchers to identify potential gene candidates participating in regulatory and/or disease pathways. Summary of (some of) the challenges Curating the information Efficiency (pre-composed vs. post-composed) Consistency between curators Missing contextual information (genetic background and environment) Observation vs. the inference made from this Representing homology (bones named by relative position) Those attempting this anyway: zebrafish, Drosophila, C. elegans, Cyprinoid fish (evolution), Dictyostelium, mouse (many), Xenopus, paramecium… Credit to Berkeley Christopher Mungall Mark Gibson Nicole Washington Rob Bruggner U of Oregon Monte Westerfield Melissa Haendel National Institutes of Health U of Cambridge Michael Ashburner George Gkoutos (PATO) David Osumi-Sutherland OBO Foundry Michael Ashburner Christopher Mungall Alan Ruttenberg Richard Scheuermann Barry Smith