Download igor_ontologies_pathways

Tools in Bioinformatics Ontologies and pathways Why are ontologies needed?  A free text is the best way to describe what a protein does to a human reader  However, it is a lousy way to tell that to a computer  When are we interested in a computer-interperable annotation?     We want all the proteins associated with a certain disease All the proteins localized to a lysosome We found a cluster of “interesting” genes and we want to know what are they involved it We want to measure the similarity between gene pairs Simple solution  The simplest solution is to use a set of keywords for every protein  Why is this a bad solution? What’s in a name?      Glucose synthesis Glucose biosynthesis Glucose formation Glucose anabolism Gluconeogenesis  All refer to the process of making glucose from simpler components What’s in a name? The problem:  Same name for different concepts  Different names for the same concept  Vast amounts of biological data from different sources  Cross-species or cross-database comparison is difficult What is the Gene Ontology?  A (part of the) solution:  The Gene Ontology: “a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing”  A controlled vocabulary to describe gene products - proteins and RNA - in any organism. What is GO?  One of the Open Biological Ontologies  Standard, species-neutral way of representing biology  Three structured networks of defined terms to describe gene product attributes  More like a phrase book than a biology text book How does GO work? What information might we want to capture about a gene product?  What does the gene product do?  Molecular function  Where and when does it act?  Cellular compartment  What is the purpose of these activities?  Biological process Molecular Function  activities or “jobs” of a gene product insulin binding insulin receptor activity Cellular Component  where a gene product acts Cellular Component Cellular Component  Enzyme complexes in the component ontology refer to places, not activities. Biological Process a commonly recognized series of events cell division Biological Process transcription Ontology Structure  Ontologies are structured as a hierarchical directed acyclic graph  Terms can have more than one parent and zero, one or more children  Terms are linked by two relationships   is-a part-of   Ontology Structure cell membrane mitochondrial membrane is-a part-of chloroplast chloroplast membrane True Path Rule  The path from a child term all the way up to its top-level parent(s) must always be true cell  nucleus chromosome But what about bacteria? True Path Rule Resolved component ontology structure: cell  cytoplasm chromosome nuclear chromosome  nucleus nuclear chromosome  GO Annotation  Using GO terms to represent the activities and localizations of a gene product  Annotations contributed by members of the GO Consortium   model organism databases cross-species databases, eg. UniProt  Annotations freely available from GO website GO Annotation  Electronic annotation  from mappings files   e.g. UniProt keyword2go High quantity but low quality   Annotations to low level terms Not checked by curators  Manual annotation   From literature curation Time consuming but high quality Where do we see GO annotations  Entrez Gene / GeneCards / SwissProt  Organism-specific databases  amigo.geneontology.org/ Pathways – beyond terms  Saying that a gene participates in gluconeogenesis and binds pyruvate in the nucleus does not provide us with all the information  Pathway databases specify where is the plays of a specific gene/protein with respect to other genes doing similar jobs KEGG – Kyoto Encyclopedia of Genes and Genomes     www.genome.jp/kegg/ http://www.genome.jp/kegg/pathway.html Manually annotated “Reference maps” linked to hundreds of genomes  Focus on metabolic pathways  Can be used to answer questions:   Give me all the genes involved in pathway X! Given a set of genes, is there a pathway that has a lot of genes in our set? KEGG BioCarta  http://www.biocarta.com/genes/index.asp  Focus on human signaling pathways MSigDB  So far we saw curated databases   Focus on the established knowledge Always lagging behind  MSigDB – combines “established” with gene sets that came up in some experiment    Up regulated after UV exposure Down in colorectal cancers Predicted targets of some transcription factor  Frequently more useful than GO/KEGG

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download igor_ontologies_pathways