Download Genomic Annotation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epistasis wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Transposable element wikipedia , lookup

Human genome wikipedia , lookup

Point mutation wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Ridge (biology) wikipedia , lookup

Protein moonlighting wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic engineering wikipedia , lookup

Genomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy wikipedia , lookup

Public health genomics wikipedia , lookup

Nutriepigenomics wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Minimal genome wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene desert wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene wikipedia , lookup

RNA-Seq wikipedia , lookup

Genome editing wikipedia , lookup

Gene expression programming wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome (book) wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Gene expression profiling wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome evolution wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Genomic Annotation
Genes and Pseudogenes in
Primates
So Far….

Understand the basics of genetic
homology



interpret score & e-value
combine local alignments
How to use homology from various
databases to improve annotation

protein, EST, neighbor species homology can
all add more evidence
Ab Initio gene finders

Ab Initio: “From the beginning”






Computer programs that attempt to find and
annotate genes based solely on the nucleotide
sequence
High success rate for prokaryotes (70 - 80%)
Low success rate for eukaryotes (15 -25%)
Most failures for eukaryotes involve the ends of
the gene (fused & split genes, wrong start or stop)
Ab initio gene finders do pretty well at getting at
least part of a gene right
Strategy: start with ab initio predictions & modify
based on other evidence; gather as much
evidence as you can to support your conclusion
Genscan

Good “basic” gene finder



Provides useful predictions even without speciesspecific training
Can be improved if you have a set of known genes
from that or related species to optimize algorithm for
those gene characteristics
Many other gene finders out there; most of these
automate the incorporation of other forms of
evidence that must also be provided (EST data,
conservation among neighbor species)
Basic Strategy for Annotation



Use ab initio prediction to focus attention
on genomic features of interest
Add as much other evidence as you can
to refine and support your conclusion
What other evidence is there?
1.
2.
3.
4.
Basic gene structure
Motif information
BLAST homologies: nr, protein, est
Other species or other proteins
Chimpanzee annotation
1.
Basic gene structure




Only ~15% of known mammalian genes
have 1 exon
Many pseudogenes are mRNA’s that have
been retro-transposed back into the
genome; many of these will appear as single
exon genes
Increase vigilance for signs of a pseudogene
for any single exon gene
Alternatively, there may be missing exons
Chimpanzee annotation
2.
Motif information


Genscan uses statistical methods to predict
genes, will tag all apparent ORFs of
sufficient length
Since genome is very large, statistical
methods will give some false positives
(sequence looks like a gene simply by chance)

If the predicted gene has protein motifs
found in other proteins, it is much less likely
to be false positive and more likely to be a
real gene or a real pseudogene
Chimpanzee annotation
BLAST homology: nr, protein, EST
3.




Homology to known proteins argues against
false positive
Mammals have many gene families and many
pseudogenes (both of these can show high
similarity to your predicted gene)
Consider length, percent identity when
examining alignments. Human vs. chimp
orthologs should differ by <1%; most paralogs
will differ by more than this
Without good EST evidence you can never be
sure; make your best guess and be able to
defend it!
Chimpanzee annotation
Other species or other proteins
4.



For any similarity hit, look for even better hits
elsewhere in the genome; orthologs and
pseudogenes will look similar but there will usually
be an even better hit somewhere else.
If you are convinced you have a gene and it is a
member of a multi-gene family, be sure to pick the
right ortholog
Look at synteny with properly distant species
(mouse or rat); evidence for a transposition
suggests a pseudogene
Group Practice


Follow the handout in which we analyze
two genes from a 170 kb region of the
chimpanzee genome
To save time the GENSCAN analysis is
completed for you and can be retrieved
from Goose