Download 19. IMG-ER Curation Environment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

Primary transcript wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Gene expression programming wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Mutation wikipedia , lookup

Public health genomics wikipedia , lookup

Oncogenomics wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

DNA supercoil wikipedia , lookup

DNA vaccination wikipedia , lookup

Gene therapy wikipedia , lookup

Cancer epigenetics wikipedia , lookup

Gene desert wikipedia , lookup

Gene nomenclature wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Molecular cloning wikipedia , lookup

Transposable element wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Pathogenomics wikipedia , lookup

Epigenomics wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Genome (book) wikipedia , lookup

RNA-Seq wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene expression profiling wikipedia , lookup

Minimal genome wikipedia , lookup

Metagenomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Microsatellite wikipedia , lookup

Genetic engineering wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Human Genome Project wikipedia , lookup

Human genome wikipedia , lookup

Point mutation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Genomic library wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Genome evolution wikipedia , lookup

History of genetic engineering wikipedia , lookup

Microevolution wikipedia , lookup

Designer baby wikipedia , lookup

Genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome editing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Advancing Science with DNA Sequence
Data Curation in IMG-ER
Natalia Ivanova
MGM Workshop
February 1, 2012
Advancing Science with DNA Sequence
Tricky question
• What do you need to do data curation in IMG?
a) I-phone
b) PhD in Computer Science
c) supernatural powers
• Correct answer: you need an IMG account
http://img.jgi.doe.gov/er
Advancing Science with DNA Sequence
What can be curated in IMG-ER?
1. Gene models
a) Add a gene
b) Make a gene pseudogene or “obsolete” (=delete it)
2. Functional annotations:
a) Product names
b) EC numbers
c) Gene symbols
If you believe something else needs to be changed (genome
name, taxonomy, etc.) – please use IMG
Questions/Comments link
What can’t be changed: automated assignments to protein
families (Pfam, COGs, TIGRfam, InterPro, SEED
assignments, KO assignments)
Advancing Science with DNA Sequence
Center point for curation – Gene Cart
Advancing Science with DNA Sequence
•
Product Name is free text (but see
GenBank requirements
http://www.ncbi.nlm.nih.gov/Genbank/geno
mesubmit_annotation.html)
•
•
•
•
Prot Description is free text (goes
to “note” in GenBank submission)
EC number and PUBMED ID – see
explanation
Notes are free text (goes to “note”
in GenBank submission)
Gene symbol is “gene name” – 4
letter abbreviation; goes to “gene”
in GenBank submission
Advancing Science with DNA Sequence
How to find the genes that need
curation?
Two possible scenarios:
• You have submitted a genome to IMG-ER
and want to have the best annotations
possible for it (e. g. for GenBank
submission)
• You’re an expert and know everything
about a certain pathway or protein family
(families) = “community service”
Advancing Science with DNA Sequence
Curation of genome annotations
•
•
“Hypothetical protein”,
but with some evidence
Non-hypothetical protein,
but no evidence
Compare Gene
Annotations
review Gene
Pages
find genome
add to Gene Cart
Genome Statistics
Find Genomes:
•
•
Genome Browser
Genome Search
refine gene set
w/o enzymes but
with candidate KO
based enzymes
•
•
•
Protein families
Homologs/orthologs
Gene Neighborhoods
Advancing Science with DNA Sequence
Why do you want to review
annotations?
• Most IMG pipelines are optimized for specificity, so
they are more likely to have false negatives, but
generate few false positives
• Compare Annotations
– Product name is a consensus of multiple assignments:
BLASTp, TIGRfam, COG, Pfam
– Sources of false negatives - cutoffs: TIGRfam trusted cutoffs
are quite stringent; COG doesn’t have trusted cutoffs;
BLASTp cutoff of 50% identity
• Candidate genes with KO annotations – sources of
false negatives
– Cutoffs for % identity and alignment length
Advancing Science with DNA Sequence
Curation of annotation in one
genome (or a set of genomes)
a) Your favorite genes (experimental
verification, etc.) -> use Find Genes, Gene
Search or BLAST
b) “Compare Annotations” on Organism Details
page
c) “Candidate genes with KO annotations” on
Organism Details page
d) KEGG Pathways (either from Organism Details
page or from Find Functions menu)
e) PhyloProfiler
Advancing Science with DNA Sequence
A shortcut for product name/EC
number assignments based on KO
Advancing Science with DNA Sequence
Example of a missed gene
• Run PhyloProfiler of Deinococcus geothermalis as a query,
Deinococcus hopiensis as target (with no homologs in)
• Select Dgeo_0119 as a sequence to check whether a
homolog of this gene was missed in Deinococcus hopiensis
Advancing Science with DNA Sequence
Adding missed genes - contd
• Use graphical viewer to
check the translation
• Adjust the start if other
start codons with better
RBS exist upstream
Advancing Science with DNA Sequence
Reviewing your annotations
• Organism Details page ->
Genome Statistics
• MyIMG
Advancing Science with DNA Sequence
IMG curation exercises
Go to the link in the usual place:
http://genomebiology.jgi-psf.org/Content/MGM-11.Feb2012/agenda.html
The first 2 pages – questions without answers; the rest is
cheat sheet