* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 09:45 PATO: An Ontology of Phenotypic Qualities
Pathogenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Genetic engineering wikipedia , lookup
Genome evolution wikipedia , lookup
Genome (book) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Koinophilia wikipedia , lookup
Gene expression profiling wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene nomenclature wikipedia , lookup
Public health genomics wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Designer baby wikipedia , lookup
PATO An Ontology of Phenotypic Qualities George Gkoutos University of Cambridge Phenotype Information Literature Qualitative descriptions Experimental data Qualitative descriptions Quantitative descriptions Various representation methodologies Complex phenotype data Need for : “A platform for facilitating mutual understanding and interoperability of phenotype information across species and domains of knowledge amongst people and machines” ….. Representation of Phenotypic data Organism attributes T – Species G – Genotype I – Strain S – Genotypic Sex A – Alleles at named loci E–Environmental/handling condition D – Age/stage of development Assay means of making observations Phenotypic Character any feature of the organism that is observed or 'assayed'. Assay Controlled Vocabulary • Abnormality • Relative_to • Ranges of values • Allows the schema to be dynamic • Definition of qualities and their relations • Explicit differences (between laboratories) • Allows labs around the world to “plug-in” their assays to the schema Phenotypic Character Assay Phenotypic Character Phenotypic Character Phenotypic character representation methodologies Pre-composition – Examples: – MGI Mouse genotype-phenotype annotation (Mammalian Phenotype) – Gramene trait annotation (Plant trait ontology) – etc. Pre-composition often follows the compositional structure occasionally adopted by GO terms. Positive/negative regulation of mitosis positive/negative + regulation of mitosis (GO:0045839) Increased/decreased angiogenesis increased/decreased + angiogenesis (GO:0001525) Advantages Easy for annotation Control Complex phenotypic information Disadvantages Lack of rigidity Ontology management Expansion Quantitative data Methodologies (cont.) post-composition The post-composition methodology takes advantage of the ability to describe phenotypes by describing the particular affected entity (bearer), which could be an anatomical structure, a biological process, a particular function etc. , and the qualities that this entity possess, which can be described either in qualitative or quantitative terms. Advantages Ontology management Rigidity expansion Quantitative data Advanced queries Disadvantages Complex phenotypic information More difficult for annotation Need for constraints for ensuring meaningful annotations Phenotype And Trait Ontology (PATO) • An ontology of phenotypic qualities, which can be shared across different species and domains of knowledge. • Qualities are the basic entities that we can perceive and/or measure: – colors, sizes, masses, lengths etc. • Qualities inhere to entities: every entity comes with certain qualities, which exist as long as the entity exist. • Qualities belong in a finite set of quality types (i.e. color, size etc) and inhere in specific individuals. No two individuals can have the same quality, and each quality is specifically constantly dependent on the entity it inheres in. Phenotypic Character Core Ontologies PATO PATO (e.g. anatomy, behaviour, pathology) Species Independent Entity (E) Quality (Q) Species Independent EQ EQ Phenotype Description Phenotype Description Simple phenotype descriptions Phenotypic Character (mouse body weight) (eye colour) (glucose concentration) entity + quality (mouse anatomy: body + PATO: weight) (Drosophila anatomy: eye + PATO: colour) (ChEBI: glucose + PATO: concentration) increased size hepatocellular carcinoma hepatocellular carcinoma (MPATH:357) has_quality increased size (PATO:0000586) Genetic Phenotype annotation model Environment Evidence Qualifier Assertion Source Entity relationship Attribution Properties Who makes the assertion When, what organization Quality Units Annotation: Phenotypes in literature Evidence: Source: light microscopy PMID:8431945 Assertion eya1 influences E=eye disc (FBbt:00001768) M. Ashburner appears Q=condensed (PATO:0001485) Date: 10/26/2007 Organization: FlyBase Version: 1 Quantitative Data • PATO – part of a representation of qualitative phenotypic information • More often than not it is important to record quantitative information that results from a specific measurement of a quality • Measurements involve units (Phenotypic Character + Unit) The tail of my mouse is 2.1 cm PATO & measurements UO – an ontology of unit UO’s top-level division is between primary base units of a particular measure and units that are derived from base units mapping between the various scalar qualities (such as weight, height, concentration etc.) and the corresponding units used to measure those qualities UO includes 264 terms, all of which are defined email list (http://sourceforge.net/mailarchive/forum.php?fo rum_id=50613) Mapping PATO to the UO Linking quantitative data to qualitative descriptions Measurement qualitative description Assay range normality necessary & sufficient conditions EQ descriptor high level annotation marking phenodeviance (e.g. MP) Multiple phenotypic characters to describe complex phenotypes SHH-/+ SHH-/- shh-/+ shh-/- Phenotype (character) = entity + quality Phenotype (character) P1 = entity = eye + quality + hypoteloric Phenotype (character) P1 P2 = entity = eye = midface + quality + hypoteloric + hypoplastic Phenotype (character) P1 P2 P3 = = = = entity eye midface kidney + + + + quality hypoteloric hypoplastic hypertrophied Phenotype (character) P1 P2 P3 = = = = entity eye midface kidney + + + + ZFIN: eye midface kidney quality hypoteloric hypoplastic hypertrophied PATO: hypoteloric + hypoplastic hypertrophied Phenotype (character) P1 P2 P3 = = = = entity eye midface kidney + + + + quality hypoteloric hypoplastic hypertrophied Phenotype = P1 + P2 + P3 (phenotypic profile) = holoprosencephaly Assays for complex phenotype data & quantitative data Phenotypic Character Assay Phenotypic Character Phenotypic Character • necessary • necessary & sufficient • phenodeviance Linking qualitative descriptions across species Decomposition of precomposed phenotype ontologies by providing logical definitions based on PATO Link annotations across different knowledge domains and species Link phenotypic descriptions of human diseases to animal models Reconciling pre and post composed annotations Retrospective PATO definitions of pre-coordinated terms in phenotype ontology Precomposed Ontologies Mammalian Phenotype Plant trait Worm phenotype etc. OMIM EQ definitions Aristotelian definitions (genus-differentia) A <Q> *which* inheres_in an <E> [Term] id: MP:0001262 name: decreased body weight namespace: mammalian_phenotype_xp Synonym: low body weight Synonym: reduced body weight def: " lower than normal average weight “[] is_a: MP:0001259 ! abnormal body weight intersection_of: PATO:0000583 ! decreased weight intersection_of: MA:0002405 ! adult mouse Phenotypic information captured differently within the same domain (OMIM) Query “large bone” "enlarged bone" "big bones" "huge bones" "massive bones" "hyperplastic bones" "hyperplastic bone" "bone hyperplasia" "increased bone growth" # of records 713 136 16 4 28 8 34 122 543 Phenotypic information captured differently across different domains MP:0001265 – decreased body size MP:0001255 – decreased body height WBPhenotype0000229 – small OMIM %210710 – short stature Logical definitions allow for cross species – domain links [Term] id: MP:0001265 ! decreased body size intersection_of: PATO:0000587 ! decreased size intersection_of: inheres_in MA:0002405 ! adult mouse [Term] id: MP:0001255 ! decreased body height intersection_of: PATO:0000569 ! decreased height intersection_of: inheres_in MA:0002405 ! adult mouse [Term] id: WBPhenotype0000229 ! small intersection_of: PATO:0000587 ! decreased size intersection_of: OBO_REL:inheres_in WBls:0000041 ! Adult [Term] id: OMIM:xxxxxxx ! short stature intersection_of: PATO:0000587 ! decreased size intersection_of: OBO_REL:inheres_in FMA!:20394 ! Body [Term] id: OMIM:xxxxxxx ! short stature intersection_of: ATO:0000569 ! decreased height intersection_of: OBO_REL:inheres_in FMA:20394 ! Body Suzie Lewis.... Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms) Strategy for Annotation Leverage OMIM gene and related disease records Use FMA, CL, GO, EDHAA, CHEBI, PATO ontologies Annotate 5 (in parallel) to check for curator consistency Annotate fly & fish orthologs (FB, ZFA) Import mouse ortholog data (MA, MP) Testing the methodology Annotated 11 gene-linked human diseases described in OMIM, and their homologs in zebrafish and fruitfly: Gene ATP2A1 EPB41 EXT2 EYA1 FECH PAX2 SHH SOX9 SOX10 TNNT2 TTN Disease Brody Myopathy Elliptocytosis Multiple Exostoses BOR syndrome Protoporphyria Renal-Coloboma Syndrome Holoprosencephaly Campomelic Dysplasia Peripheral Demyelinating Neuropathy Familial Hypertrophic Cardiomyopathy Muscular Dystrophy An OMIM Record Annotation Results phenotype statements average/ total allele Gene # geno-types ATP2A1 5 16 3 EPB41 4 18 4 EXT2 5 35 7 EYA1* 16 335 19 FECH 14 37 3 PAX2* 24 183 8 SHH 19 207 9 SOX9* 13 321 23 SOX10* 15 192 12 TNNT2 10 36 4 TTN Total (11) 21 146 63 1443 3 Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms) Ontology-based similarity scoring Measure IC of any node: Compute ‘similarity’ by finding IC ratios between any genotypes, genes, classes, etc. Ontology-based Search Algorithm Given a query node q, we try to find hits h1, h2,... that are of the same type as q, and are similar to q in terms of their annotation profile, A(q). First step: create an annotation profile for the thing to be searched (i.e., a gene) The annotation profile is the set of classes used to annotate that entity, and their ancestors c ∈ A(q) iff link(r,q,c) link(influences,sox9,curvature-of-tibia) → link(influences,sox9,morphology-of-bone) Comparing annotation profiles using same similarity IC metric Yes, we can find alleles of same gene allelic phenotype profiles Gene # genotypes ATP2A1 5 5 0.8 EPB41 EXT2 EYA1* FECH PAX2* SHH SOX9* SOX10* TNNT2 TTN Total (11) 4 5 16 14 24 19 13 15 10 21 4 5 16 14 24 19 13 13 10 19 0.315 1 0.226 0.365 0.068 0.457 0.207 0.038 0.517 0.106 146 142 # alleles >0 average sim average IC sim ratio ratio ratio phenotype statements total average/ allele 0.799 16 3 0.422 1 0.229 0.364 0.063 0.414 0.197 0.031 0.505 0.1 18 35 335 37 183 207 321 192 36 63 4 7 19 3 8 9 23 12 4 3 1443 Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms) UBERON: an anatomical linking ontology Each organism has its own anatomical ontology To connect annotations across species, need a way to link the anatomies Wanted an ontology that incorporated both functional homology and anatomical similarity Created an ontology linking anatomies from ZFA, FMA, XAO, MA, MIAA, WBbt, FBbt UBERON connects phenotype entities from separate anatomy ontologies Homologs are found by similarity search Gene simIC human/ mouse simIC human/ zebrafish ATP2A1 EPB41 EXT2 EYA1 FECH PAX2 SHH SOX9 SOX10 TNNT2 TTN 0.047 0.328 0.067 0.264 0.430 0.157 0.091 0.226 0.380 0.000 0.248 0.177 0.141 0.050 0.495 0.101 0.375 0.253 0.383 0.443 0.118 0.567 Experimental Design Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms) shha is phenotypically similar to homologous pathway members zebrafish shh pathway shha smo disp1 prdm1a hdac1 scube2 wnt11 gli1,2a bmp2b ndr1,2 hhip ptc1,ptc2 notch1a mouse homologs Shh Smo Disp1 Prdm1 human homologs SHH HDAC4 Wnt1, 7b, 3a, 9b, 10b Gli2, Gli3 Bmp4 WNT6 GLI2 NDRG1 Hhip Ptch1,2 Rab23 Gas1 Nck1 Zic2 Notch1,2 Gsk3b Potential candidates also found Gene Similarity dharma 0.483 tbx16 0.401 plod3 0.387 ntl 0.382 kny 0.374 Characterization Paired type homeodomain protein that has dorsal organizer inducing activity and is regulated by wnt signaling. T-box transcription factor regulates mesenchyme to epithelial transition and LR patterning. Lysyl hydroxylase and glycosyltransferase important for axonal growth cone migration. T-box transcription factor important for notochord and mesoderm development. Glypican component of the wnt/PCP pathway tll1 0.372 Metalloprotease that can cleave Chordin and increase Bmp activity. copa 0.372 Cotamer vesicular coat complex important for maintenance of the Golgi and ER transport. Important for notochord differentiation. sfpq 0.369 lama1 0.369 lamc1 atp7a 0.367 0.365 atp2a1 0.363 flh 0.358 wnt5b 0.327 RNA splicing factor required for cell survival and neuronal development. Basement membrane protein important for eye and body axis development. Basement membrane protein important for eye development Copper transporting ATPase. Sarcoplasmic reticulum transmembrane ATPase that mediates calcium re-uptake. Homeobox gene important for notochord and epiphysis development. Anterior/posterior expression determined by wnt activity. Extracellular cysteine rich glycoprotein required for convergent extension movements during posterior segmentation. Results thus far Annotate 11 human disease genes, and their homologs Develop search algorithm that utilizes the ontologies for comparison Test search algorithm by asking, “given a set of phenotypic descriptions (EQ stmts), can we find…” alleles of the same gene homologs in different organisms members of a pathway (same organism) members of a pathway (other organisms) Conclusions Ontologies help Promising new directions for ontologybased phenotype annotation Promising ways for identifying novel pathway members, generating hypotheses to test at the bench Acknowledgements NCBO-Berkeley • Christopher Mungall • Nicole Washington • Mark Gibson • Rob Bruggner Cambridge Michael Ashburner George Gkoutos (PATO) David Osumi-Sutherland U of Oregon • Monte Westerfield • Melissa Haendel National Institutes of Health