* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download How different is anatomy?
Zinc finger nuclease wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Minimal genome wikipedia , lookup
Pathogenomics wikipedia , lookup
Point mutation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Transposable element wikipedia , lookup
Gene expression programming wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Genetic engineering wikipedia , lookup
Gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
Metagenomics wikipedia , lookup
Epigenetics in stem-cell differentiation wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Genome evolution wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene expression profiling wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Microevolution wikipedia , lookup
Genome editing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Helitron (biology) wikipedia , lookup
Designer baby wikipedia , lookup
Practical Ontologies Lessons from the GO February 2011 The time was 1998-99 None of the model organism databases used standard terminology to describe biological function Drosophila sequence was imminent Largest genome sequenced at that time Two weeks, 3 dozen scientists, all new software How could we organize the annotation? microArray technology was the latest research tool, and results needed to be described AI folk and ontologists organized the first “bio-ontologies” workshop at ISMB The Gene Ontology—the beginning A handful of biologists (4) met in a bar in Montreal after the bioontologies workshop to share their frustrations and decided to just do it*… Would demonstrate possibilities for data integration across the MODs (FlyBase, SGD, MGD) Provided an organizing principle for the Drosophila genome annotation jamboree * i.e. Describe gene products in a biologically meaningful way. Late summer 1999 AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT AGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGT GCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGT AGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAG TCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCT CGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTAT ATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCT GATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCT AGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGA AGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT reads assemble sequence analysis Mountains of data Tentative function filtering Love-atfirst-sight Piles of data ‘GO’ directories converging First-pass predictions Functional knowns The Gene Ontology project Annotated now The importance of stress-testing Don’t delay, use your ontology today Do no harm (KISS) i.e. Target the low hanging fruit, work on the obvious, high-confidence steps Collaborate on concrete projects Focusing the mind Annotations Have 3 primary components The ontology term(s) The entity instance (e.g. gene product) The evidence for that assertion An annotation is an evidence-based assertion which indicates that this entity is best classified/described by this term(s) Identify genes Read paper(s) SPCC622.16c PMID:17449867 SPCC622.16c GO:0005720 Identify GO terms Identify GO terms associated with each gene IDA IDA What type of evidence? GO:0005720 PMID:17449867 Classification rule: Disambiguation = bud initiation = bud initiation = bud initiation The same name can be used to describe different things. Classification rule: Disambiguation = tooth bud initiation = cellular bud initiation = flower bud initiation Include plain “bud initiation” as a synonym for each of these terms Disambiguation Exactly the same thing can be described with different terms Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis Comparison is difficult, especially across species or across databases that each use one of these different variants Use a single term, and plenty of synonyms Annotation for a healthy ontology Easier to find the most accurate term(s) to use Avoids annotation errors Easier for new curators to learn and understand Develop annotation guidelines and training material Enables automatic reasoning for searching & inference Bottom line: Following basic construction rules makes more useful ontologies Improvement needed: Closing the loop Typical ontology developer Typical wet lab PI annotating data Doh! I get it now, says the computer. The Gene Ontology project Annotated now The importance of stress-testing Don’t delay, use your ontology today Do no harm (KISS) i.e. Target the low hanging fruit, work on the obvious, high-confidence steps Collaborate on concrete projects Focusing the mind GO in 2000-2008 Filling in annotation gaps July 2008 GO:0016301 kinase activity GO:0016310 phosphorylation 2230 3823 1410 |P| = 3640 |F| = 6053 |F ∩ P| = 2230 |F ∩ not P| = 3823 part_of part_of annotations propagate over part_of KIC1 IDA part_of annotations propagate over part_of KIC1 IDA part_of annotations propagate over part_of NDK1 IDA part_of annotations propagate over part_of NDK1 IDA Filling in annotation gaps 2009 GO:0016301 kinase activity GO:0016310 phosphorylation The H word—2011 time divergence Characters in common are due to inheritance Allows inferences about common ancestor Evolution of MSH2 subfamily biological process Somatic hypermutation of immunoglobulin genes Apoptosis Maintenance of DNA repeats Homologous recombination DNA repair Ancestral inference E.c. Biochemistry: purification and assay A.t. MTHFR1 A.t. MTHFR2 D.d. S.p. S.c. MET13 S.p. S.c. MET12 C.e. D.m. A.g. D.r. G.g. H.s. MTHFR R.n. M.m. Genetics: mutant phenotypes divergence • Integration at points of common ancestry • Infer “hidden” character of living organisms • Explicitly leverage evolutionary relationships Integrating different GO annotations PAINT Phylogenetic Annotation and Inference Tool The Gene Ontology project Annotated now The importance of stress-testing Don’t delay, use your ontology today Do no harm (KISS) i.e. Target the low hanging fruit, work on the obvious, high-confidence steps Collaborate on concrete projects Focusing the mind Scoping 2009 SGD MGD GO FlyBase The ontology has a clearly specified and clearly delineated content. Decisions to make the work easier Provide definitions for everything Intelligible ontologies are more useful To humans (for annotation) and To machines (for searching, reasoning and error-checking) Use content-free unique identifiers Drive all semantics away from tracking Don’t confuse the representational technology with the conceptual modeling Implicit ontologies within the GO: cysteine biosynthesis (ChEBI) myoblast fusion (Cell Type Ontology) hydrogen ion transporter activity (ChEBI) snoRNA catabolism (Sequence Ontology) wing disc pattern formation (Drosophila anatomy) epidermal cell differentiation (Cell Type Ontology) regulation of flower development (Plant anatomy) B-cell differentiation (Cell Type Ontology) Implicit anatomy ontology within the GO: GO brain development hindbrain development metencephalon development pons development trigeminal motor nucleus development is bearer of has part Alpha-Synuclein Mouse Substantia nigra number of Lewy body Ischemic Mouse Nucleus Golgi Apparatus Condensed Mitochondrion Lysosome Condensed Mitochondrion Dark Material Orthodox Mitochondrion Condensed Mitochondrion is bearer of number of Condensed Mitochondrion Common Interest Sociology—to enlist the community, the ontology must meet each individual group’s immediate needs. Too many people => Too many requirements Outstanding problems Closing the loop between ontology construction and ontology application QC improvements Prioritizing tasks Visualization … A cast of thousands