Download PATO - Buffalo Ontology Site

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pharmacogenomics wikipedia , lookup

Hybrid (biology) wikipedia , lookup

Pathogenomics wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Metagenomics wikipedia , lookup

Epistasis wikipedia , lookup

Dominance (genetics) wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome evolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Public health genomics wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Koinophilia wikipedia , lookup

Gene expression programming wikipedia , lookup

Designer baby wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Phenotype And Trait Ontology
(PATO)
and plant phenotypes
George Gkoutos
Phenotype And Trait Ontology (PATO)
The meaningful cross species and across domain
translation of phenotype is essential  phenotypedriven gene function discovery and comparative
pathobiology
Goal - “A platform for facilitating mutual understanding
and interoperability of phenotype information across
• species,
• domains of knowledge,
and amongst people and machines” …..
PATO today
PATO is now being used as a community
standard for phenotype description
– many consortia (e.g. Phenoscape, The Virtual
Human Physiology project (VPH), IMPC, BIRN, NIF)
– most of the major model organism databases,
(e.g. example Flybase, Dictybase, Wormbase, Zfin,
Mouse genome database (MGD))
– international projects
PATO’s Semantic Framework
• Conceptual Layer
• Semantic Components Layer
• Unification Layer
• Integration Layer
PATO Conceptual Layer
EQ
EQ Model
link Entities (E) from
GO, CheBI, FMA etc.
to Qualities (Q) from
PATO 
EQ statements
Process
PATO
Object
Semantic Components Layer
EQ
• Behavior
• Physiology
• Cell Phenotype
• Pathology
• UBERON
• Measurements
(Units Ontology)
NBO
UO
MPATH
PATO
CL
GO
UBERON
CHEBI
Unification Layer
EQ
CPO
WBPhenotype
NBO
Provision of
PATO based
equivalence
definitions
UO
MPATH
PATO
MP
CL
YPO
GO
UBERON
CHEBI
FBCV
TO
HPO
Integration Layer
EQ
CPO
WBPhenotype
NBO
UO
MPATH
PATO
MP
CL
YPO
GO
UBERON
CHEBI
FBCV
TO
HPO
Cross Species Data Integration
Cross species integration framework
• A PATO-based cross species phenotype network based
on experimental data from 5 model organisms yeast,
fly, worm, fish and mouse and human disease
phenotypes (OMIM, OrphaNet)
• integration of anatomy and phenotype ontologies
– more than 1,000,000 classes and 2,500,000 axioms
• PhenomeNET forms a network with more than 300.000
complex phenotype nodes representing complex
phenotypes, diseases, drug indications and adverse
reactions
• Semantic similarity measures  pairwise comparison
of disease and animal phenotypes
Area Under Curve (AUC) = 0.9
• quantitative evaluation based on predicting orthology, pathway, disease
• enhance the network e.g.
– semantics e.g Behavior and pathology related phenotypes etc.
– methods e.g. text mining, machine learning etc.
– resources
Candidate disease gene prioritization
Mouse (MP)
Human (HPO)
E1: Aorta(MA)
Q: overlap with (PATO)
E2: Membranous
interventricular septum (MA)
E1: Aorta(FMA)
Q: overlap with (PATO)
E2: Membranous part of the
interventricular septum (FMA)
E: Pulmonary valve (MA)
Q: constricted (PATO)
E: Pulmonary valve (FMA)
Q: constricted (PATO)
E1: ventricular septum (MA)
Q: closure incomplete (PATO)
E: Interventricular septum (FMA)
Q: closure incomplete (PATO)
E: heart ventricle wall(MA)
Q: hypertrophic (PATO)
E: Wall of right ventricle (FMA)
Q: hypertrophic (PATO)
• Predict all known human and mouse disease genes
• Adam19 and Fgf15 mouse genes
• using zebrafish phenotypes - mammalian homologues
of Cx36.7 and Nkx2.5 are involved in TOF
Data Integration Across Domains of
Knowledge
Gene function determination
• bridge the gap between the availability of phenotype 
infer the functions (impaired given a phenotype observation)
• example: delayed cartilage development 
EQ = GO:0051216 ! cartilage development PATO:0000502! delayed
• phenotype data from 5 different species  more than 40000 novel
gene functions
• evaluation:
• manually for biological correctness
• predict genetic interactions based on gene functions (semantic
similarity between genes)
Species
Fish
Yeast
Fly
Worm
Mouse
Inferred
1717
14047
1633
9221
15693
• improvement for every species, and significantly improvement for
fruitfly, worm and mouse
• GO-based gene function annotations  analysis gene expression data
• large volume of data from high-throughput mutagenesis screens
p-value
0.24
0.162
0.041
0.03
0.00007
Can we do the same for plants ?
PPPP pilot project
• Dataset
– manual EQ annotations for various species
• Framework for plant phenotype analyses
– build a Plant PhenomeNet
• Analysis
– analyze the dataset and quantify the extent of phenotype
similarity among certain groups of alleles/genes
– prediction of gene identity for QTL or genetically defined
locus
– etc
Plant Phenotyping Centers
• Participating centres
–
–
–
–
–
National Plant Phenomics Centre
European Plant Phenomics Centre
Julich Plant Phenotyping Centre
Rothamsted Research
INRA
• Goal: PATO-based plant phenotyping analysis
framework
– Link to PPPP and NSF proposal
Solanaceae Phenotypic Atlas (SPA)
PATO-based methodology to systematically study
associations between genotype, phenotype and
environment in Solanum
– Nightshade Phenotype Ontology (NPO)
– Compare phenotypic similarity across
flowering plants
– Generate a Solanum geospatial map
– Link environmental features to the Solanum
geospatial map
Questions?