* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download CACAO_remote_training
Transcriptional regulation wikipedia , lookup
Community fingerprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene expression wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene desert wikipedia , lookup
Molecular evolution wikipedia , lookup
Genome evolution wikipedia , lookup
Protein moonlighting wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene nomenclature wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Gene expression profiling wikipedia , lookup
List of types of proteins wikipedia , lookup
CACAO - Remote training http://gowiki.tamu.edu/wiki/index.php/Category:CACAO Gene Function and Gene Ontology Fall 2011 “Scientists find gene that ...” An avalanche of genes • • High throughput sequencing is finding genes faster than we can understand them Goals for annotation: – – Where the genes are in the genome What their functions are Function annotation • Allows us to – Infer the functions of genes • • • • Related by common descent Related by similar expression patterns Related by phylogenetic profiles ... Function annotation • Allows us to – – – Understand the capabilities of organisms genomes Understand patterns of gene expression • • • ... In different environments In different tissues In disease states Classic MODel Literature Database Curators (rate limiting) Datasets Requirements • • • Accurate functional annotation for as many genes as possible A system of assigning function that allows both humans and computers to compare, contrast, analyze, and predict gene function Curators to make and/or check these assignments – For CACAO, we will teach you what biocurators do. CACAO • Community • Assessment – • Community – • Annotation with – • Ontologies How well can you (with our coaching) assign gene functions – using GO? CACAO is competitive • Teams get points for complete annotations – – – – GO term (right level of specificity) reference evidence code identify where in the paper the evidence comes from • Teams can take away points from competitors by challenging annotations – finding a problem – suggesting a better alternative What’s in it for you (besides credit)? – We hope you will • • • • learn how we think about gene function gain skills that will help your future career enjoy contributing to a resource used by people all over the world have fun! The gist of CACAO… Finding evidence (in papers) Making annotations Using GO terms GO = Gene Ontology • • Controlled vocabulary – – Everyone uses the same terms Terms have IDs that computers can understand Relationships between functions Gene Ontology A common system for describing gene function GO • 3 aspects (ontologies) for gene products 1. Biological Process 2. Molecular Function 3. Cellular Component • Used to make annotations – aka Gene associations – Term + qualifiers + evidence code + reference etc. Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity figure from GO consortium presentations from GOC Biological Process a commonly recognized series of events cell division Figure from Nature Reviews Microbiology 6, 28-40 (January 2008) Cellular Component • where a gene product acts Key elements of a GO annotation Submitted to GO consortium Viewable on GONUTS **Don’t worry - I will cover this again (several times)! GO Annotation • To make an annotation, you need to – Assign GO terms to genes (gene products) • • – At appropriate level of specificity Sometimes with Qualifiers – – – NOT Contributes_to Colocalizes_with Record the evidence Record the evidence • • Where it came from: – Reference (database accession) • PMID:6987663 Kind of evidence: – Evidence codes • • • IMP: Inferred from Mutant Phenotype IDA: Inferred from Direct Assay … CACAO - the “Community Annotation” part What I am going to tell you about next is: 1. How to choose proteins to annotate 2. Finding GO terms & navigating a GO term page 3. Finding UniProt accessions 4. Making gene pages on GONUTS & the anatomy of a gene page 5. How and where to add an annotation 6. Where to look for your annotations & other teams’ annotations … (& the challenges!) http://gowiki.tamu.edu/wiki/index.php/ Deciding what to annotate 1. randomly 2. topics of interest (ie efflux pump proteins, biofilms) 3. papers you have come across while doing other stuff 4. methods you know or want to learn 5. phenotypes and mutants you are interested in 6. by author 7. by pathway or regulon 8. suggested by another (ie high IEA:manual annotation ratio) 9. current paper mentions another gene product 10. review papers (ie Annual Reviews are excellent sources) EXAMPLE #1: let’s say you have a great paper (PMID:1111) that characterizes the tyrosine kinase activity of your favorite protein (human p53)… Part I: Where do you search for GO terms? GONUTS http://gowiki.tamu.edu • CHICK - AgBase (Gallus gallus) • dictyBase - dictyBase (Dictyostelium discoideum - slime mold) • FB - FlyBase (Drosophila melanogaster) • HUMAN - Reactome, BHF-UCL • MGI - Mouse genome informatics (Mus musculus - house mouse) • SGD - Saccharomyces genome database (Saccharomyces cerevisiase - yeast) • TAIR - The Arabidopsis Informatics Resource (Arabidopsis thaliana) • WB - WormBase (Caenorhabditis elegans) • ZFIN - Zebrafish model organism database (Danio rerio) What do you actually need once you have found the correct term? GO:0004713 Part II: You now have a paper, a protein & you found a suitable GO term… what next? • UniProt accession - http://www.uniprot.org - Search (“Query”) & find the correct UniProt accession for your protein - Look something like: P012A9 Part III: Where are you going to add your annotations? GONUTS http://gowiki.tamu.edu How do you make a new gene page in GONUTS? • • Use the UniProt accession to make a page that you will be able to add your own annotation to. GoPageMaker will: 1. 2. Check if the page exists in GONUTS & take you there if it does. Make a page & pull all of the annotations from UniProt into a table that you can edit. Where do you add an annotation? Add a row in the table. What you must fill in (for every annotation) GO:0004713 PMID:1111 IDA: Inferred from direct assay Figure 2a What you might also have to fill in Not sure? Check the competition guidelines. Ask a coach (Jim, Debby, Adrienne or usually me)! Where will your annotation now show up? 1. In the “Annotation” table on the gene page you just edited 2. In the table on your user page http://gowiki.tamu.edu/wiki/index.php/User:Siebenmc 3. In the table on your team page http://gowiki.tamu.edu/wiki/index.php/Category:Team_Mu_subunits 4. As points on the scoreboard http://gowiki.tamu.edu/wiki/index.php/Category:CACAO_Spring_2011 5. If challenged, it will show up in the “Submitted Challenges” table (below the scoreboard) Questions? At this point, you should be able to: 1. Find GO terms on GONUTS 2. Find UniProt accessions on UniProt 3. Make a gene page on GONUTS 4. Add an annotation CACAO - the “Community Assessment” part 1 3 2 Scoreboard Submitted Challenges Moving through challenges Closed Challenges http://gowiki.tamu.edu/wiki/index.php/Category:Michigan_State_CACAO Category:Team UCL1 Example starting from a paper 1. Hypothetically given a paper in another class on human gastric lipase by Wicker-Planquart et al (1999). 2. http://www.ncbi.nlm.nih.gov/pubmed/10411623 What is the molecular function of the protein? What process is it involved in? Where is it doing it’s job(s) in the cell? Examples starting from a topic 1. Search PubMed for “biofilm genes” 2. Eighth paper is - Isolation of Genes Involved In Biofilm Formation of a Klebsiella pneumoniae… 3. http://www.ncbi.nlm.nih.gov/pubmed/21858144 What proteins are discussed in this paper? • What is the molecular function of each protein? • What processes are they involved in? • Where are they doing their jobs? • WHAT DO THE AUTHORS DEMONSTRATE IN THE PAPER? Example starting from a protein 1. Searched Uniprot for “biofilm” 2. Protein from E. coli - BssR 3. Search on PubMed for “bssR AND coli” 4. http://www.ncbi.nlm.nih.gov/pubmed/16597943 What is the molecular function of these proteins? What process are they involved in? Where are they doing their job(s) in the cell?