* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genomic Annotation
Saethre–Chotzen syndrome wikipedia , lookup
Transposable element wikipedia , lookup
Human genome wikipedia , lookup
Point mutation wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Ridge (biology) wikipedia , lookup
Protein moonlighting wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Copy-number variation wikipedia , lookup
Genetic engineering wikipedia , lookup
Genomic imprinting wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene therapy wikipedia , lookup
Public health genomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Minimal genome wikipedia , lookup
History of genetic engineering wikipedia , lookup
Gene desert wikipedia , lookup
Pathogenomics wikipedia , lookup
Genome editing wikipedia , lookup
Gene expression programming wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene nomenclature wikipedia , lookup
Genome (book) wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression profiling wikipedia , lookup
Helitron (biology) wikipedia , lookup
Genome evolution wikipedia , lookup
Designer baby wikipedia , lookup
Genomic Annotation Genes and Pseudogenes in Primates So Far…. Understand the basics of genetic homology interpret score & e-value combine local alignments How to use homology from various databases to improve annotation protein, EST, neighbor species homology can all add more evidence Ab Initio gene finders Ab Initio: “From the beginning” Computer programs that attempt to find and annotate genes based solely on the nucleotide sequence High success rate for prokaryotes (70 - 80%) Low success rate for eukaryotes (15 -25%) Most failures for eukaryotes involve the ends of the gene (fused & split genes, wrong start or stop) Ab initio gene finders do pretty well at getting at least part of a gene right Strategy: start with ab initio predictions & modify based on other evidence; gather as much evidence as you can to support your conclusion Genscan Good “basic” gene finder Provides useful predictions even without speciesspecific training Can be improved if you have a set of known genes from that or related species to optimize algorithm for those gene characteristics Many other gene finders out there; most of these automate the incorporation of other forms of evidence that must also be provided (EST data, conservation among neighbor species) Basic Strategy for Annotation Use ab initio prediction to focus attention on genomic features of interest Add as much other evidence as you can to refine and support your conclusion What other evidence is there? 1. 2. 3. 4. Basic gene structure Motif information BLAST homologies: nr, protein, est Other species or other proteins Chimpanzee annotation 1. Basic gene structure Only ~15% of known mammalian genes have 1 exon Many pseudogenes are mRNA’s that have been retro-transposed back into the genome; many of these will appear as single exon genes Increase vigilance for signs of a pseudogene for any single exon gene Alternatively, there may be missing exons Chimpanzee annotation 2. Motif information Genscan uses statistical methods to predict genes, will tag all apparent ORFs of sufficient length Since genome is very large, statistical methods will give some false positives (sequence looks like a gene simply by chance) If the predicted gene has protein motifs found in other proteins, it is much less likely to be false positive and more likely to be a real gene or a real pseudogene Chimpanzee annotation BLAST homology: nr, protein, EST 3. Homology to known proteins argues against false positive Mammals have many gene families and many pseudogenes (both of these can show high similarity to your predicted gene) Consider length, percent identity when examining alignments. Human vs. chimp orthologs should differ by <1%; most paralogs will differ by more than this Without good EST evidence you can never be sure; make your best guess and be able to defend it! Chimpanzee annotation Other species or other proteins 4. For any similarity hit, look for even better hits elsewhere in the genome; orthologs and pseudogenes will look similar but there will usually be an even better hit somewhere else. If you are convinced you have a gene and it is a member of a multi-gene family, be sure to pick the right ortholog Look at synteny with properly distant species (mouse or rat); evidence for a transposition suggests a pseudogene Group Practice Follow the handout in which we analyze two genes from a 170 kb region of the chimpanzee genome To save time the GENSCAN analysis is completed for you and can be retrieved from Goose