Download 09:45 PATO: An Ontology of Phenotypic Qualities

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pathogenomics wikipedia , lookup

Metagenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Genetic engineering wikipedia , lookup

Genome evolution wikipedia , lookup

Genome (book) wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Koinophilia wikipedia , lookup

Gene wikipedia , lookup

Gene expression profiling wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene nomenclature wikipedia , lookup

Public health genomics wikipedia , lookup

Epistasis wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Gene expression programming wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
PATO
An Ontology of Phenotypic
Qualities
George Gkoutos
University of Cambridge
Phenotype Information
 Literature
 Qualitative descriptions
 Experimental data
 Qualitative descriptions
 Quantitative descriptions
 Various representation methodologies
 Complex phenotype data
 Need for :
“A platform for facilitating mutual understanding and
interoperability of phenotype information across species and
domains of knowledge amongst people and machines” …..
Representation of
Phenotypic data
Organism attributes
T – Species
G – Genotype
I – Strain
S – Genotypic Sex
A – Alleles at named loci
E–Environmental/handling condition
D – Age/stage of development
Assay
means of making observations
Phenotypic Character
any feature of the organism that
is observed or 'assayed'.
Assay Controlled Vocabulary
• Abnormality
• Relative_to
• Ranges of values
• Allows the schema to be
dynamic
• Definition of qualities and their relations
• Explicit differences (between laboratories)
• Allows labs around the world to “plug-in” their
assays to the schema
Phenotypic
Character
Assay
Phenotypic
Character
Phenotypic
Character
Phenotypic character
representation methodologies

Pre-composition
– Examples:
– MGI Mouse genotype-phenotype annotation (Mammalian Phenotype)
– Gramene trait annotation (Plant trait ontology)
– etc.
 Pre-composition often follows the compositional structure occasionally adopted by GO
terms.
 Positive/negative regulation of mitosis  positive/negative + regulation of mitosis
(GO:0045839)
 Increased/decreased angiogenesis  increased/decreased + angiogenesis
(GO:0001525)
 Advantages
 Easy for annotation
 Control
 Complex phenotypic information
 Disadvantages
 Lack of rigidity
 Ontology management
 Expansion
 Quantitative data
Methodologies (cont.)
 post-composition
 The post-composition methodology takes advantage of the ability to
describe phenotypes by describing the particular affected entity
(bearer), which could be an anatomical structure, a biological process,
a particular function etc. , and the qualities that this entity possess,
which can be described either in qualitative or quantitative terms.
 Advantages





Ontology management
Rigidity
expansion
Quantitative data
Advanced queries
 Disadvantages
 Complex phenotypic information
 More difficult for annotation
 Need for constraints for ensuring meaningful annotations
Phenotype And Trait Ontology (PATO)
• An ontology of phenotypic qualities, which can be shared across
different species and domains of knowledge.
• Qualities are the basic entities that we can perceive and/or
measure:
– colors, sizes, masses, lengths etc.
• Qualities inhere to entities: every entity comes with certain
qualities, which exist as long as the entity exist.
• Qualities belong in a finite set of quality types (i.e. color, size etc)
and inhere in specific individuals. No two individuals can have the
same quality, and each quality is specifically constantly dependent
on the entity it inheres in.
Phenotypic Character
Core Ontologies
PATO
PATO
(e.g. anatomy, behaviour,
pathology)
Species Independent
Entity (E)
Quality (Q)
Species Independent
EQ
EQ
Phenotype
Description
Phenotype Description
Simple phenotype descriptions
Phenotypic Character
(mouse body weight)
(eye colour)
(glucose concentration)
entity + quality
(mouse anatomy: body + PATO: weight)
(Drosophila anatomy: eye + PATO: colour)
(ChEBI: glucose + PATO: concentration)
increased size hepatocellular carcinoma
hepatocellular carcinoma (MPATH:357) has_quality increased size (PATO:0000586)
Genetic
Phenotype
annotation model
Environment
Evidence
Qualifier
Assertion
Source
Entity
relationship
Attribution
Properties
Who makes the assertion
When, what organization
Quality
Units
Annotation:
Phenotypes
in literature
Evidence:
Source:
light microscopy
PMID:8431945
Assertion
eya1
influences
E=eye disc
(FBbt:00001768)
M. Ashburner
appears
Q=condensed
(PATO:0001485)
Date: 10/26/2007
Organization: FlyBase
Version: 1
Quantitative Data
• PATO – part of a representation of
qualitative phenotypic information
• More often than not it is important to
record quantitative information that results
from a specific measurement of a quality
• Measurements involve units (Phenotypic
Character + Unit)
The tail of my mouse is 2.1 cm
PATO & measurements
 UO – an ontology of unit
 UO’s top-level division is between primary base units
of a particular measure and units that are derived from
base units
 mapping between the various scalar qualities (such as
weight, height, concentration etc.) and the
corresponding units used to measure those qualities
 UO includes 264 terms, all of which are defined
 email list
(http://sourceforge.net/mailarchive/forum.php?fo
rum_id=50613)
Mapping PATO to the UO
Linking quantitative data to
qualitative descriptions
 Measurement  qualitative description
 Assay
 range
 normality
 necessary & sufficient conditions
 EQ descriptor  high level annotation
marking phenodeviance (e.g. MP)
Multiple phenotypic characters to
describe complex phenotypes
SHH-/+
SHH-/-
shh-/+
shh-/-
Phenotype
(character)
= entity
+ quality
Phenotype
(character)
P1
= entity
= eye
+ quality
+ hypoteloric
Phenotype
(character)
P1
P2
= entity
= eye
= midface
+ quality
+ hypoteloric
+ hypoplastic
Phenotype
(character)
P1
P2
P3
=
=
=
=
entity
eye
midface
kidney
+
+
+
+
quality
hypoteloric
hypoplastic
hypertrophied
Phenotype
(character)
P1
P2
P3
=
=
=
=
entity
eye
midface
kidney
+
+
+
+
ZFIN:
eye
midface
kidney
quality
hypoteloric
hypoplastic
hypertrophied
PATO:
hypoteloric
+
hypoplastic
hypertrophied
Phenotype
(character)
P1
P2
P3
=
=
=
=
entity
eye
midface
kidney
+
+
+
+
quality
hypoteloric
hypoplastic
hypertrophied
Phenotype = P1 + P2 + P3
(phenotypic profile) = holoprosencephaly
Assays for complex phenotype data
& quantitative data
Phenotypic
Character
Assay
Phenotypic
Character
Phenotypic
Character
• necessary
• necessary & sufficient
• phenodeviance
Linking qualitative descriptions across
species
 Decomposition of precomposed phenotype
ontologies by providing logical definitions
based on PATO
 Link annotations across different knowledge
domains and species
 Link phenotypic descriptions of human
diseases to animal models
Reconciling pre and post composed
annotations
 Retrospective PATO definitions of pre-coordinated terms
in phenotype ontology
 Precomposed Ontologies




Mammalian Phenotype
Plant trait
Worm phenotype
etc.
 OMIM
EQ definitions
Aristotelian definitions (genus-differentia)
A <Q> *which* inheres_in an <E>
[Term]
id: MP:0001262
name: decreased body weight
namespace: mammalian_phenotype_xp
Synonym: low body weight
Synonym: reduced body weight
def: " lower than normal average weight “[]
is_a: MP:0001259 ! abnormal body weight
intersection_of: PATO:0000583 ! decreased weight
intersection_of: MA:0002405 ! adult mouse
Phenotypic information captured differently
within the same domain (OMIM)
Query
“large bone”
"enlarged bone"
"big bones"
"huge bones"
"massive bones"
"hyperplastic bones"
"hyperplastic bone"
"bone hyperplasia"
"increased bone growth"
# of records
713
136
16
4
28
8
34
122
543
Phenotypic information captured
differently across different domains
 MP:0001265 – decreased body size
 MP:0001255 – decreased body height
 WBPhenotype0000229 – small
 OMIM %210710 – short stature
Logical definitions allow for cross species –
domain links
[Term]
id: MP:0001265 ! decreased body size
intersection_of: PATO:0000587 ! decreased size
intersection_of: inheres_in MA:0002405 ! adult mouse
[Term]
id: MP:0001255 ! decreased body height
intersection_of: PATO:0000569 ! decreased height
intersection_of: inheres_in MA:0002405 ! adult mouse
[Term]
id: WBPhenotype0000229 ! small
intersection_of: PATO:0000587 ! decreased size
intersection_of: OBO_REL:inheres_in WBls:0000041 ! Adult
[Term]
id: OMIM:xxxxxxx ! short stature
intersection_of: PATO:0000587 ! decreased size
intersection_of: OBO_REL:inheres_in FMA!:20394 ! Body
[Term]
id: OMIM:xxxxxxx ! short stature
intersection_of: ATO:0000569 ! decreased height
intersection_of: OBO_REL:inheres_in FMA:20394 ! Body
 Suzie Lewis....
Experimental Design
 Annotate 11 human disease genes, and
their homologs
 Develop search algorithm that utilizes
the ontologies for comparison
 Test search algorithm by asking, “given a
set of phenotypic descriptions (EQ stmts),
can we find…”




alleles of the same gene
homologs in different organisms
members of a pathway (same organism)
members of a pathway (other organisms)
Strategy for Annotation
 Leverage OMIM gene and related
disease records
 Use FMA, CL, GO, EDHAA, CHEBI,
PATO ontologies
 Annotate 5 (in parallel) to check for
curator consistency
 Annotate fly & fish orthologs (FB, ZFA)
 Import mouse ortholog data (MA, MP)
Testing the methodology
 Annotated 11 gene-linked human diseases
described in OMIM, and their homologs in
zebrafish and fruitfly:
Gene
ATP2A1
EPB41
EXT2
EYA1
FECH
PAX2
SHH
SOX9
SOX10
TNNT2
TTN
Disease
Brody Myopathy
Elliptocytosis
Multiple Exostoses
BOR syndrome
Protoporphyria
Renal-Coloboma Syndrome
Holoprosencephaly
Campomelic Dysplasia
Peripheral Demyelinating Neuropathy
Familial Hypertrophic Cardiomyopathy
Muscular Dystrophy
An OMIM Record
Annotation Results
phenotype statements
average/
total
allele
Gene
# geno-types
ATP2A1
5
16
3
EPB41
4
18
4
EXT2
5
35
7
EYA1*
16
335
19
FECH
14
37
3
PAX2*
24
183
8
SHH
19
207
9
SOX9*
13
321
23
SOX10*
15
192
12
TNNT2
10
36
4
TTN
Total (11)
21
146
63
1443
3
Experimental Design
 Annotate 11 human disease genes, and
their homologs
 Develop search algorithm that utilizes
the ontologies for comparison
 Test search algorithm by asking, “given a
set of phenotypic descriptions (EQ stmts),
can we find…”




alleles of the same gene
homologs in different organisms
members of a pathway (same organism)
members of a pathway (other organisms)
Ontology-based similarity scoring
 Measure IC of any node:
 Compute ‘similarity’ by finding IC ratios
between any genotypes, genes, classes,
etc.
Ontology-based Search Algorithm
 Given a query node q, we try to find hits h1, h2,... that
are of the same type as q, and are similar to q in
terms of their annotation profile, A(q).
 First step: create an annotation profile for the thing
to be searched (i.e., a gene)
 The annotation profile is the set of classes used to
annotate that entity, and their ancestors
c ∈ A(q) iff link(r,q,c)
link(influences,sox9,curvature-of-tibia) → link(influences,sox9,morphology-of-bone)
 Comparing annotation profiles using same similarity IC
metric
Yes, we can find alleles of same gene
allelic phenotype profiles
Gene
# genotypes
ATP2A1
5
5
0.8
EPB41
EXT2
EYA1*
FECH
PAX2*
SHH
SOX9*
SOX10*
TNNT2
TTN
Total (11)
4
5
16
14
24
19
13
15
10
21
4
5
16
14
24
19
13
13
10
19
0.315
1
0.226
0.365
0.068
0.457
0.207
0.038
0.517
0.106
146
142
# alleles >0 average sim average IC
sim ratio
ratio
ratio
phenotype statements
total
average/
allele
0.799
16
3
0.422
1
0.229
0.364
0.063
0.414
0.197
0.031
0.505
0.1
18
35
335
37
183
207
321
192
36
63
4
7
19
3
8
9
23
12
4
3
1443
Experimental Design
 Annotate 11 human disease genes, and
their homologs
 Develop search algorithm that utilizes
the ontologies for comparison
 Test search algorithm by asking, “given a
set of phenotypic descriptions (EQ stmts),
can we find…”




alleles of the same gene
homologs in different organisms
members of a pathway (same organism)
members of a pathway (other organisms)
UBERON: an anatomical linking ontology
 Each organism has its own anatomical
ontology
 To connect annotations across species,
need a way to link the anatomies
 Wanted an ontology that incorporated
both functional homology and anatomical
similarity
 Created an ontology linking anatomies
from ZFA, FMA, XAO, MA, MIAA,
WBbt, FBbt
UBERON connects phenotype entities
from separate anatomy ontologies
Homologs are found by similarity search
Gene
simIC
human/
mouse
simIC
human/
zebrafish
ATP2A1
EPB41
EXT2
EYA1
FECH
PAX2
SHH
SOX9
SOX10
TNNT2
TTN
0.047
0.328
0.067
0.264
0.430
0.157
0.091
0.226
0.380
0.000
0.248
0.177
0.141
0.050
0.495
0.101
0.375
0.253
0.383
0.443
0.118
0.567
Experimental Design
 Annotate 11 human disease genes, and
their homologs
 Develop search algorithm that utilizes
the ontologies for comparison
 Test search algorithm by asking, “given a
set of phenotypic descriptions (EQ stmts),
can we find…”




alleles of the same gene
homologs in different organisms
members of a pathway (same organism)
members of a pathway (other organisms)
shha is phenotypically similar to
homologous pathway members
zebrafish shh
pathway
shha
smo
disp1
prdm1a
hdac1
scube2
wnt11
gli1,2a
bmp2b
ndr1,2
hhip
ptc1,ptc2
notch1a
mouse
homologs
Shh
Smo
Disp1
Prdm1
human homologs
SHH
HDAC4
Wnt1, 7b, 3a, 9b,
10b
Gli2, Gli3
Bmp4
WNT6
GLI2
NDRG1
Hhip
Ptch1,2
Rab23
Gas1
Nck1
Zic2
Notch1,2
Gsk3b
Potential candidates also found
Gene
Similarity
dharma
0.483
tbx16
0.401
plod3
0.387
ntl
0.382
kny
0.374
Characterization
Paired type homeodomain protein that has dorsal organizer
inducing activity and is regulated by wnt signaling.
T-box transcription factor regulates mesenchyme to epithelial
transition and LR patterning.
Lysyl hydroxylase and glycosyltransferase important for axonal
growth cone migration.
T-box transcription factor important for notochord and mesoderm
development.
Glypican component of the wnt/PCP pathway
tll1
0.372
Metalloprotease that can cleave Chordin and increase Bmp activity.
copa
0.372
Cotamer vesicular coat complex important for maintenance of the
Golgi and ER transport. Important for notochord differentiation.
sfpq
0.369
lama1
0.369
lamc1
atp7a
0.367
0.365
atp2a1
0.363
flh
0.358
wnt5b
0.327
RNA splicing factor required for cell survival and neuronal
development.
Basement membrane protein important for eye and body axis
development.
Basement membrane protein important for eye development
Copper transporting ATPase.
Sarcoplasmic reticulum transmembrane ATPase that mediates
calcium re-uptake.
Homeobox gene important for notochord and epiphysis
development. Anterior/posterior expression determined by wnt
activity.
Extracellular cysteine rich glycoprotein required for convergent
extension movements during posterior segmentation.
Results thus far
 Annotate 11 human disease genes, and
their homologs
 Develop search algorithm that utilizes
the ontologies for comparison
 Test search algorithm by asking, “given a
set of phenotypic descriptions (EQ stmts),
can we find…”
alleles of the same gene
homologs in different organisms
members of a pathway (same organism)
members of a pathway (other organisms)
Conclusions
 Ontologies help
 Promising new directions for ontologybased phenotype annotation
 Promising ways for identifying novel
pathway members, generating
hypotheses to test at the bench
Acknowledgements
NCBO-Berkeley
• Christopher Mungall
• Nicole Washington
• Mark Gibson
• Rob Bruggner
Cambridge
 Michael Ashburner
 George Gkoutos (PATO)
 David Osumi-Sutherland
U of Oregon
• Monte Westerfield
• Melissa Haendel
National
Institutes of
Health