* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download BioOntologies2007_jb.. - Bio
Point mutation wikipedia , lookup
History of genetic engineering wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Genome evolution wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Saethre–Chotzen syndrome wikipedia , lookup
Genome (book) wikipedia , lookup
Protein moonlighting wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene therapy wikipedia , lookup
The Selfish Gene wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene desert wikipedia , lookup
Helitron (biology) wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression programming wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene nomenclature wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene Ontology Annotations: What they mean and where they come from Judith A. Blake, David P. Hill, Barry Smith BioOntologies SIG: Vienna July 20, 2007 GO Consortium Project Goals 1. We will maintain comprehensive, logically rigorous and biologically accurate ontologies. *2. We will comprehensively annotate reference genomes in as complete detail as possible. *3. We will support annotation across all organisms. 4. We will provide our annotations and tools to the research community. GO terms are used for functional annotations I Brain development [GO:0007420] (141 genes, 207 annotations) I GO Stats: GO Annotations Total experimental GO annotations - 388,633 Total proteins with manual annotations – 80,402 Contributing Groups (including MGI): - 19 Total Pub Med References – 346,002 Total number predicted annotations – 17,029,553 I Total number taxa – 129,318 Total number distinct proteins – 2,971,374 April 24, 2007 Annotations are assertions Annotations provide the connection between genomic information and the GO. Experiments provide the data that enables us to annotate gene products with terms from the ontologies. Annotations for App: amyloid beta (A4) precursor protein We use evidence codes to describe the basis of the annotation IDA: Inferred from direct assay IPI: Inferred from physical interaction IMP: Inferred from mutant phenotype IGI: Inferred from genetic interaction IEP: Inferred from expression pattern ISS: Inferred from sequence or structural similarity TAS: Traceable author statement NAS:Non-traceable author statement IC: Inferred by curator RCA: Reviewed Computational Analysis IEA: Inferred from electronic annotation ND: no data available Direct Experiment NO Direct Experiment Examples of how we connect instances with knowledge representation in the GO What follows are examples of annotation of the biomedical literature using GO types, gene product types and evidence codes Example #1:Molecular Function using IDA Figure from Zhang M, Chen W, Smith SM, Napoli JL. Molecular characterization of a mouse short chain dehydrogenase/reductase active with all-trans-retinol in intact cells, mRDH1. J Biol Chem. 2001 Nov 23;276(47):44083-90. The Observation NAD+ NADH H+ The Annotation: What are the instances in this experiment? Gene product instances Molecules of retinol dehydrogenase Molecular function instances Instances of execution of the molecular function revealed by the assay Instances of molecular function associated with instances of retinol dehydrogenase. These instances are the potential of a molecule of retinol dehydrogenase to execute the function retinol dehydrogenase activity. What knowledge are we trying to capture? We are interested in understanding how gene products contribute to the biology of an organism. How do wet-bench biologists learn about gene products? They do experiments! Experiments are designed to study the properties of gene product instances. Experimental biologists take on “The Burden of Proof”. How do we represent the accumulated knowledge? We* make annotations! ****** Annotations connect what wet-bench biologists see in the lab with how we represent our current understanding of biological reality * GO curators So, where are the instances? The instances are in the lab. We use what people report about instances, but we never actually deal with them directly What do we mean by gene product? Gene Product Type Stands proxy for the ‘gene’ Genes are what we have in MODs Types = what instances have in common Gene Product Instance A molecule of a gene product It can be physically isolated It takes up space What do we mean by annotations? An annotation Asserts that instances of molecules of a type of gene product have propensity to act as designated by the terms in an ontology such as the GO Is created on the basis of observations of the instances of such types in experiments and of the inferences drawn from such observations Note: comprehensive experimental details are embedded in biomedical publications and in specialized databases Example #2: Molecular Function using IMP Figure from Schulz S, Lopez MJ, Kuhn M, Garbers DL. Disruption of the guanylyl cyclase-C gene leads to a paradoxical phenotype of viable but heat-stable enterotoxin-resistant mice. J Clin Invest. 1997 Sep 15;100(6):1590-5. The Observation X X The Annotation: IMP What are the instances in this experiment? Gene product instances Molecules of GUCY2C protein The lack of functional molecules of GUCY2C in mutants Molecular function instances The execution of the molecular function, measured by the accumulation of cGMP The potential of a molecule of GUCY2C to execute the molecular function Revealed by the correlation between a lack of molecules and a lack of executions of molecular function The Curator Perspective: Annotation Process 1. Identification of relevant experimental data - Biomedical literature as primary source - Annotations inferred from experiments in performed in other organisms or inferred from sequence structure The Curator Perspective: Annotation Process 1. 2. Identification of relevant experimental data Identification of the appropriate ontology annotation term - Experimental assay influences limit of resolution/granularityof term assignment available to use - Differences in expertise among curators should result in close, but not necessarily exact, GO term annotations The Curator Perspective: Annotation Process 1. 1. 2. Identification of relevant experimental data Identification of the appropriate ontology annotation term Employment of annotation quality control processes for - Correct formal structure - Evaluate annotation consistency - Harvest emerging knowledge to refine and extend the GO Example #3: Biological Process Using IMP Washington Smoak I; Byrd NA; Abu-Issa R; Goddeeris MM; Anderson R; Morris J; Yamamura K; Klingensmith J; Meyers EN, Sonic hedgehog is required for cardiac outflow tract and neural crest cell development., Dev Biol 2005 Jul 15;283(2):357-72. The Observation X The Annotation: IMP What are the instances in this Experiment? Gene product instances Molecules of the Shh gene Biological Process instances Non-functional molecules of the Shh gene The development of a mouse heart Molecular Function Instances The execution of a molecular function by a molecule of the Shh gene So, when a biological process occurs, it is the result of molecules of a gene product(s) executing their molecular function(s) How do wet-bench biologists learn about gene products? They do experiments! Experiments are designed to study the properties of gene product instances. Experimental biologists take on “The Burden of Proof”. They make conclusions about gene product types based on the accumulated experimental data! If experiments show: All instances of a gene product studied have the potential to execute the function tyrosine kinase Instances of the same gene product are involved in the biological process limb development All instances of the same gene product are found in instances of the cytoplasm A wet-bench biologist would conclude: The gene product of this gene is a tyrosine kinase that functions in the cytoplasm and the tyrosine kinase functioning is used in limb development If we comprehensively annotate genes, can we make the same conclusions? This is the basis of biological discovery! Analysis of gene product annotations lead to new hypothesis for wet-bench biologists to test Development of GO depends on intersection of curation with ontology refinements Process of annotation brings new experimental results into perspective with existing scientific knowledge captured in the ontology New results may stand in conflict with current version of ontology One of strengths of GO development paradigm is that it is primarily a task of biologist-curators who are experts in understanding the experimental systems Hypothesis generation Data mining, and prediction using ontologies Experiments and data analysis using GO, etc Experimental Literature Informatics Resources Improved annotations, in MODs, UniProt; Refine bio-ontologies Summary Gene product annotation is an integral aspect of the work of the GO Consortium Annotations reflect conclusions from experiments as interpreted by the biologist and reviewed by peers The structure of the GO depends upon accumulated knowledge from many experiments resulting in a representation of current thought about biological reality As experimental data changes our view of reality, the ontology must change as well