Download and ways to find them

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Exome sequencing wikipedia , lookup

Gene expression wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Gene desert wikipedia , lookup

Community fingerprinting wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Gene regulatory network wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Molecular evolution wikipedia , lookup

Genomic imprinting wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Unlocated Arthropod genes
and ways to find them
Many bug genes are hard to find
- Daphnia’s many tandems were lost for a bit
Duplicate genes, a bain and a boon
Genome tile expression picks out many more
Don Gilbert
April 2008
Genome Informatics Lab, Biology Dept., Indiana University
[email protected]
wfleabase.org/docs/arthropod-gene-finding/
Environ Stresses find Novels
Novel Daphnia genes
show under stress
QuickTime™ and a
decompressor
are needed to see this picture.
Novel Drosophila
species genes are
missed by prediction
QuickTime™ and a
decompressor
are needed to see this picture.
wfleabase.org/docs/arthropod-gene-finding/
Duplicate genes are common
Daphnia
surpasses
C.elegans for
rich tandem
gene set.
QuickTime™ and a
decompressor
are needed to see this picture.
Bugs have many
tandem genes
wfleabase.org/docs/arthropod-gene-finding/
Duplicates confuse Finders
QuickTime™ and a
decompressor
are needed to see this picture.
Prediction errors are common in duplicate gene regions.
None of 13 predictors found all 4 tandems of this Dwil P450 cluster, but
each gene was properly predicted among them.
wfleabase.org/docs/arthropod-gene-finding/
Duplicates find Errors
Prediction cline is
artifact of Dmel
training. Retraining
with Dmoj removes it.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
Duplicates solve prediction dilemma in Drosophila.
wfleabase.org/docs/arthropod-gene-finding/
Odorant genes concur
Curation of Drosophila Obp genes also removes prediction cline.
QuickTime™ and a
decompressor
are needed to see this picture.
Vieira et al. (2007), and further analysis by myself recovered genes
using Psi-Blast trained on species Obp genes. Computational errors are
significantly more common in Far-, Mid-mel group. Obp genes show no
overall gain/loss across groups.
wfleabase.org/docs/arthropod-gene-finding/
Tile expression finds genes
Daphnia tile expression with gene finding calls 26% coding bases over the
genome, compared to 17% from gene predictions, or 5,000 - 10,000 new genes.
QuickTime™ and a
decompressor
are needed to see this picture.
Manak et al 2006, with Drosmel also found 24% CDS/genome, up from 18%
CDS/genome from reference gene set. Computational tools need to mature; gene
finding is preliminary.
wfleabase.org/docs/arthropod-gene-finding/
Summary: Locating novel genes
1.
More genes are expressed in unusual environs, and are specific.
Use
many environmental, developmental and tissue conditions to see range of genes via
expression. Understand the limits of gene homology.
2.
Duplicate genes are common, a problem, an aid to finding genes.
Examine duplicate genes carefully. Tools that distinguish these can be used to find
paralogs missed by traditional methods.
3.
Near species training reduces errors and spurious effects.
4.
Genome-wide tile expression finds more genes. As an alternative to EST
Use samespecies and near-species data as much as possible in preparing automated annotations.
Be aware of and control for informant species-distance as a source of bias.
studies, it has values and drawbacks. Computational methods need to
improve to use this data well.
wfleabase.org/docs/arthropod-gene-finding/
Genome maps on your laptop
Genome data sets that I use are available for your computer.
Includes GMOD GBrowse software in a ready-to-run bundle*
http://eugenes.org/gmod/genomeview-package2008/
* This is fully configured for Intel-MacOSX 10.5, others need further installation.
See http://www.gmod.org/GBrowse
Map data (large) are at ftp://eugenes.org/eugenes/gbrowse/databases/
daphnia_pulex : Daphnia genome data from wfleabase.org
nasonia : Wasp gene predictions, homology, EST
tribcas : Tribolium basic gene set from NCBI genomes
drospege : 12 Drosophila genomes
drosmel : Dros. mel rel 5.5 genome with Affymetrix transcriptome data
wfleabase.org/docs/arthropod-gene-finding/
End note
Acknowledgements
I am grateful to support from NSF (DBI-0640462) and the NIH, including TeraGrid
award for making this work possible.
Daphnia sequencing and portions of the analyses were provided by DOE Joint
Genome Institute and in collaboration with the Daphnia Genomics Consortium
(DGC).
References
Gilbert, 2007. New and old genes in Drosophila genomes. http://insects.eugenes.org/DroSpeGe/about/analysis-doc/
Gilbert, 2007. Daphnia gene duplicates. http://wfleabase.org/genome-summaries/gene-duplicates/
Gilbert, 2008. Tandem genes lost + found. http://insects.eugenes.org/DroSpeGe/about/analysis-doc/
Manak, JR et al., 2006. .. unannotated transcription in Dros. mel. Nature Genetics, doi:10.1038/ng1875
Vieira, F.G. et al. 2007. .. analysis of the Odorant-Binding genes in Drosophila genomes. Genome Biology, doi:10.1186/gb2007-8-11-r235
wfleabase.org/docs/arthropod-gene-finding/