* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 19. IMG-ER Curation Environment
Epigenetics of neurodegenerative diseases wikipedia , lookup
Primary transcript wikipedia , lookup
Genealogical DNA test wikipedia , lookup
Gene expression programming wikipedia , lookup
Nucleic acid double helix wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Public health genomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
DNA supercoil wikipedia , lookup
DNA vaccination wikipedia , lookup
Gene therapy wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Molecular cloning wikipedia , lookup
Transposable element wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Cell-free fetal DNA wikipedia , lookup
Pathogenomics wikipedia , lookup
Epigenomics wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Genome (book) wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Gene expression profiling wikipedia , lookup
Minimal genome wikipedia , lookup
Metagenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Microsatellite wikipedia , lookup
Genetic engineering wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Human Genome Project wikipedia , lookup
Human genome wikipedia , lookup
Point mutation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genomic library wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome evolution wikipedia , lookup
History of genetic engineering wikipedia , lookup
Microevolution wikipedia , lookup
Designer baby wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome editing wikipedia , lookup
Advancing Science with DNA Sequence Data Curation in IMG-ER Natalia Ivanova MGM Workshop February 1, 2012 Advancing Science with DNA Sequence Tricky question • What do you need to do data curation in IMG? a) I-phone b) PhD in Computer Science c) supernatural powers • Correct answer: you need an IMG account http://img.jgi.doe.gov/er Advancing Science with DNA Sequence What can be curated in IMG-ER? 1. Gene models a) Add a gene b) Make a gene pseudogene or “obsolete” (=delete it) 2. Functional annotations: a) Product names b) EC numbers c) Gene symbols If you believe something else needs to be changed (genome name, taxonomy, etc.) – please use IMG Questions/Comments link What can’t be changed: automated assignments to protein families (Pfam, COGs, TIGRfam, InterPro, SEED assignments, KO assignments) Advancing Science with DNA Sequence Center point for curation – Gene Cart Advancing Science with DNA Sequence • Product Name is free text (but see GenBank requirements http://www.ncbi.nlm.nih.gov/Genbank/geno mesubmit_annotation.html) • • • • Prot Description is free text (goes to “note” in GenBank submission) EC number and PUBMED ID – see explanation Notes are free text (goes to “note” in GenBank submission) Gene symbol is “gene name” – 4 letter abbreviation; goes to “gene” in GenBank submission Advancing Science with DNA Sequence How to find the genes that need curation? Two possible scenarios: • You have submitted a genome to IMG-ER and want to have the best annotations possible for it (e. g. for GenBank submission) • You’re an expert and know everything about a certain pathway or protein family (families) = “community service” Advancing Science with DNA Sequence Curation of genome annotations • • “Hypothetical protein”, but with some evidence Non-hypothetical protein, but no evidence Compare Gene Annotations review Gene Pages find genome add to Gene Cart Genome Statistics Find Genomes: • • Genome Browser Genome Search refine gene set w/o enzymes but with candidate KO based enzymes • • • Protein families Homologs/orthologs Gene Neighborhoods Advancing Science with DNA Sequence Why do you want to review annotations? • Most IMG pipelines are optimized for specificity, so they are more likely to have false negatives, but generate few false positives • Compare Annotations – Product name is a consensus of multiple assignments: BLASTp, TIGRfam, COG, Pfam – Sources of false negatives - cutoffs: TIGRfam trusted cutoffs are quite stringent; COG doesn’t have trusted cutoffs; BLASTp cutoff of 50% identity • Candidate genes with KO annotations – sources of false negatives – Cutoffs for % identity and alignment length Advancing Science with DNA Sequence Curation of annotation in one genome (or a set of genomes) a) Your favorite genes (experimental verification, etc.) -> use Find Genes, Gene Search or BLAST b) “Compare Annotations” on Organism Details page c) “Candidate genes with KO annotations” on Organism Details page d) KEGG Pathways (either from Organism Details page or from Find Functions menu) e) PhyloProfiler Advancing Science with DNA Sequence A shortcut for product name/EC number assignments based on KO Advancing Science with DNA Sequence Example of a missed gene • Run PhyloProfiler of Deinococcus geothermalis as a query, Deinococcus hopiensis as target (with no homologs in) • Select Dgeo_0119 as a sequence to check whether a homolog of this gene was missed in Deinococcus hopiensis Advancing Science with DNA Sequence Adding missed genes - contd • Use graphical viewer to check the translation • Adjust the start if other start codons with better RBS exist upstream Advancing Science with DNA Sequence Reviewing your annotations • Organism Details page -> Genome Statistics • MyIMG Advancing Science with DNA Sequence IMG curation exercises Go to the link in the usual place: http://genomebiology.jgi-psf.org/Content/MGM-11.Feb2012/agenda.html The first 2 pages – questions without answers; the rest is cheat sheet