* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2007-06_gene-expression-analysis_JL
Survey
Document related concepts
Point mutation wikipedia , lookup
Ridge (biology) wikipedia , lookup
Genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene therapy wikipedia , lookup
Community fingerprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene desert wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene nomenclature wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Transcript
Using the Gene Ontology (GO) for analysis of expression data Jane Lomax EMBL-EBI 25th June 2007 Jane Lomax What is the Gene Ontology? • Set of standard biological phrases (terms) which are applied to genes/proteins: – protein kinase – apoptosis – membrane 25th June 2007 Jane Lomax What is the Gene Ontology? • Genes are linked, or associated, with GO terms by trained curators at genome databases – known as ‘gene associations’ or GO annotations • Some GO annotations created automatically 25th June 2007 Jane Lomax GO annotations GO database gene -> GO term associated genes genome and protein databases 25th June 2007 Jane Lomax What is the Gene Ontology? • Allows biologists to make queries across large numbers of genes without researching each one individually 25th June 2007 Jane Lomax Eisen, Michael B. et al. (1998) Proc. Natl. Acad. Sci. USA 95, 14863-14868 Copyright ©1998 by the National Academy of Sciences GO structure • GO isn’t just a flat list of biological terms • terms are related within a hierarchy 25th June 2007 Jane Lomax GO structure gene A 25th June 2007 Jane Lomax GO structure • This means genes can be grouped according to user-defined levels • Allows broad overview of gene set or genome 25th June 2007 Jane Lomax How does GO work? • GO is species independent – some terms, especially lower-level, detailed terms may be specific to a certain group • e.g. photosynthesis – But when collapsed up to the higher levels, terms are not dependent on species 25th June 2007 Jane Lomax How does GO work? What information might we want to capture about a gene product? • What does the gene product do? • Where and does it act? • Why does it perform these activities? 25th June 2007 Jane Lomax GO structure • GO terms divided into three parts: – cellular component – molecular function – biological process 25th June 2007 Jane Lomax Cellular Component • where a gene product acts 25th June 2007 Jane Lomax Cellular Component 25th June 2007 Jane Lomax Cellular Component 25th June 2007 Jane Lomax Cellular Component • Enzyme complexes in the component ontology refer to places, not activities. 25th June 2007 Jane Lomax Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity 25th June 2007 Jane Lomax Molecular Function 25th June 2007 insulin binding insulin receptor activity Jane Lomax Molecular Function 25th June 2007 drug transporter activity Jane Lomax Molecular Function • A gene product may have several functions • Sets of functions make up a biological process. 25th June 2007 Jane Lomax Biological Process a commonly recognized series of events 25th June 2007 Jane Lomax cell division Biological Process 25th June 2007 Jane Lomax transcription Biological Process regulation of gluconeogenesis 25th June 2007 Jane Lomax Biological Process 25th June 2007 Jane Lomax limb development Biological Process 25th June 2007 Jane Lomax courtship behavior Ontology Structure • Terms are linked by two relationships – is-a – part-of 25th June 2007 Jane Lomax Ontology Structure cell membrane mitochondrial membrane 25th June 2007 Jane Lomax is-a part-of chloroplast chloroplast membrane Ontology Structure • Ontologies are structured as a hierarchical directed acyclic graph (DAG) • Terms can have more than one parent and zero, one or more children 25th June 2007 Jane Lomax Ontology Structure cell membrane mitochondrial membrane 25th June 2007 Jane Lomax Directed Acyclic Graph (DAG) - multiple parentage allowed chloroplast chloroplast membrane Anatomy of a GO term id: GO:0006094 unique GO ID name: gluconeogenesis term name ontology namespace: process def: The formation of glucose from noncarbohydrate precursors, such as definition pyruvate, amino acids and glycerol. [http://cancerweb.ncl.ac.uk/omd/index.html] exact_synonym: glucose biosynthesis synonym xref_analog: MetaCyc:GLUCONEO-PWY database ref is_a: GO:0006006 parentage is_a: GO:0006092 25th June 2007 Jane Lomax GO terms • Where do GO terms come from? – GO terms are added by editors at EBI and annotating databases – new terms are usually only added when they are asked for by annotators – GO editors work with experts to make major ontology developments • metabolism • pathogenesis • cell cycle 25th June 2007 Jane Lomax GO stats • over 23,000 GO terms: – 13593 biological_process – 1980 cellular_component – 7700 molecular_function 25th June 2007 Jane Lomax GO annotations • Where do the links between genes and GO terms come from? 25th June 2007 Jane Lomax GO annotations • Contributing databases: – – – – – – – – – – – – – – Berkeley Drosophila Genome Project (BDGP) dictyBase (Dictyostelium discoideum) FlyBase (Drosophila melanogaster) GeneDB (Schizosaccharomyces pombe, Plasmodium falciparum, Leishmania major and Trypanosoma brucei) UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases Gramene (grains, including rice, Oryza) Mouse Genome Database (MGD) and Gene Expression Database (GXD) (Mus musculus) Rat Genome Database (RGD) (Rattus norvegicus) Reactome Saccharomyces Genome Database (SGD) (Saccharomyces cerevisiae) The Arabidopsis Information Resource (TAIR) (Arabidopsis thaliana) The Institute for Genomic Research (TIGR): databases on several bacterial species WormBase (Caenorhabditis elegans) Zebrafish Information Network (ZFIN): (Danio rerio) 25th June 2007 Jane Lomax Species coverage • All major eukaryotic model organism species • Human via GOA group at UniProt • Several bacterial and parasite species through TIGR and GeneDB at Sanger – many more in pipeline 25th June 2007 Jane Lomax Annotation coverage 25th June 2007 Jane Lomax Anatomy of a GO annotation • Three key parts: – gene name/id – GO term(s) – evidence for association 25th June 2007 Jane Lomax Example annotation • Breast cancer type 1 susceptibility protein gene in humans 25th June 2007 Jane Lomax Types of GO annotation: 25th June 2007 Electronic Annotation Manual Annotation Jane Lomax Manual annotation • Created by scientific curators • High quality • Small number 25th June 2007 Jane Lomax Manual annotation In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response… 25th June 2007 Jane Lomax Manual annotation 25th June 2007 Jane Lomax Electronic Annotation • Annotation derived without human validation – mappings file e.g. interpro2go, ec2go. – Blast search ‘hits’ • Lower ‘quality’ than manual codes 25th June 2007 Jane Lomax Mappings files Fatty acid biosynthesis ( Swiss-Prot Keyword) EC:6.4.1.2 (EC number) GO:Fatty acid biosynthesis (GO:0006633) GO:acetyl-CoA carboxylase activity (GO:0003989) IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry) 25th June 2007 Jane Lomax GO:acetyl-CoA carboxylase activity (GO:0003989) Evidence types • • • • • • • • • • ISS: Inferred from Sequence/structural Similarity IDA: Inferred from Direct Assay IPI: Inferred from Physical Interaction IMP: Inferred from Mutant Phenotype IGI: Inferred from Genetic Interaction IEP: Inferred from Expression Pattern TAS: Traceable Author Statement NAS: Non-traceable Author Statement IC: Inferred by Curator ND: No Data available • IEA: Inferred from electronic annotation 25th June 2007 Jane Lomax GO tools • GO resources are freely available to anyone to use without restriction – Includes the ontologies, gene associations and tools developed by GO • Other groups have used GO to create tools for many purposes: http://www.geneontology.org/GO.tools 25th June 2007 Jane Lomax GO tools • Affymetrix also provide a Gene Ontology Mining Tool as part of their NetAffx™ Analysis Center which returns GO terms for probe sets 25th June 2007 Jane Lomax GO tools • Many tools exist that use GO to find common biological functions from a list of genes: http://www.geneontology.org/GO.tools.microarray.shtml 25th June 2007 Jane Lomax GO tools • Most of these tools work in a similar way: – input a gene list and a subset of ‘interesting’ genes – tool shows which GO categories have most interesting genes associated with them i.e. which categories are ‘enriched’ for interesting genes – tool provides a statistical measure to determine whether enrichment is significant 25th June 2007 Jane Lomax Microarray process • • • • • • • • Treat samples Collect mRNA Label Hybridize Scan Normalize Select differentially regulated genes Understand the biological phenomena involved 25th June 2007 Jane Lomax Traditional analysis Gene 1 Apoptosis Cell-cell signaling Protein phosphorylation Mitosis … Gene 3 Growth control Gene 4 Mitosis Nervous system Oncogenesis Pregnancy Protein phosphorylation Oncogenesis … Mitosis … 25th June 2007 Jane Lomax Gene 2 Growth control Mitosis Oncogenesis Protein phosphorylation … Gene 100 Positive ctrl. of cell prolif Mitosis Oncogenesis Glucose transport … Traditional analysis • gene by gene basis • requires literature searching • time-consuming 25th June 2007 Jane Lomax Using GO annotations • But by using GO annotations, this work has already been done for you! GO:0006915 : apoptosis 25th June 2007 Jane Lomax Grouping by process Apoptosis Gene 1 Gene 53 Positive ctrl. of cell prolif. Gene 7 Gene 3 Gene 12 … 25th June 2007 Jane Lomax Mitosis Gene 2 Gene 5 Gene45 Gene 7 Gene 35 … Glucose transport Gene 7 Gene 3 Gene 6 … Growth Gene 5 Gene 2 Gene 6 … GO for microarray analysis • Annotations give ‘function’ label to genes • Ask meaningful questions of microarray data e.g. – genes involved in the same process, same/different expression patterns? 25th June 2007 Jane Lomax Using GO in practice • statistical measure – how likely your differentially regulated genes fall into that category by chance 80 70 60 50 40 30 20 10 0 microarray 1000 genes 25th June 2007 experiment Jane Lomax 100 genes differentially regualted mitosis apoptosis positive control of glucos e transport cell proliferation mitosis – 80/100 apoptosis – 40/100 p. ctrl. cell prol. – 30/100 glucose transp. – 20/100 Using GO in practice • However, when you look at the distribution of all genes on the microarray: Process Genes on array mitosis apoptosis p. ctrl. cell prol. glucose transp. 25th June 2007 Jane Lomax 800/1000 400/1000 100/1000 50/1000 # genes expected in 100 random genes 80 40 10 5 occurred 80 40 30 20 Enrichment tools • GO is developing its own enrichment tool as part of the GO browser AmiGO • Currently in testing phase, should be released next month 25th June 2007 Jane Lomax Onto-Express walkthrough http://vortex.cs.wayne.edu/projects.htm#Onto-Express 25th June 2007 Jane Lomax