* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download and ways to find them
Survey
Document related concepts
Exome sequencing wikipedia , lookup
Gene expression wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Gene desert wikipedia , lookup
Community fingerprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene regulatory network wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Molecular evolution wikipedia , lookup
Genomic imprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Ridge (biology) wikipedia , lookup
Transcript
Unlocated Arthropod genes and ways to find them Many bug genes are hard to find - Daphnia’s many tandems were lost for a bit Duplicate genes, a bain and a boon Genome tile expression picks out many more Don Gilbert April 2008 Genome Informatics Lab, Biology Dept., Indiana University [email protected] wfleabase.org/docs/arthropod-gene-finding/ Environ Stresses find Novels Novel Daphnia genes show under stress QuickTime™ and a decompressor are needed to see this picture. Novel Drosophila species genes are missed by prediction QuickTime™ and a decompressor are needed to see this picture. wfleabase.org/docs/arthropod-gene-finding/ Duplicate genes are common Daphnia surpasses C.elegans for rich tandem gene set. QuickTime™ and a decompressor are needed to see this picture. Bugs have many tandem genes wfleabase.org/docs/arthropod-gene-finding/ Duplicates confuse Finders QuickTime™ and a decompressor are needed to see this picture. Prediction errors are common in duplicate gene regions. None of 13 predictors found all 4 tandems of this Dwil P450 cluster, but each gene was properly predicted among them. wfleabase.org/docs/arthropod-gene-finding/ Duplicates find Errors Prediction cline is artifact of Dmel training. Retraining with Dmoj removes it. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. QuickTime™ and a decompressor are needed to see this picture. Duplicates solve prediction dilemma in Drosophila. wfleabase.org/docs/arthropod-gene-finding/ Odorant genes concur Curation of Drosophila Obp genes also removes prediction cline. QuickTime™ and a decompressor are needed to see this picture. Vieira et al. (2007), and further analysis by myself recovered genes using Psi-Blast trained on species Obp genes. Computational errors are significantly more common in Far-, Mid-mel group. Obp genes show no overall gain/loss across groups. wfleabase.org/docs/arthropod-gene-finding/ Tile expression finds genes Daphnia tile expression with gene finding calls 26% coding bases over the genome, compared to 17% from gene predictions, or 5,000 - 10,000 new genes. QuickTime™ and a decompressor are needed to see this picture. Manak et al 2006, with Drosmel also found 24% CDS/genome, up from 18% CDS/genome from reference gene set. Computational tools need to mature; gene finding is preliminary. wfleabase.org/docs/arthropod-gene-finding/ Summary: Locating novel genes 1. More genes are expressed in unusual environs, and are specific. Use many environmental, developmental and tissue conditions to see range of genes via expression. Understand the limits of gene homology. 2. Duplicate genes are common, a problem, an aid to finding genes. Examine duplicate genes carefully. Tools that distinguish these can be used to find paralogs missed by traditional methods. 3. Near species training reduces errors and spurious effects. 4. Genome-wide tile expression finds more genes. As an alternative to EST Use samespecies and near-species data as much as possible in preparing automated annotations. Be aware of and control for informant species-distance as a source of bias. studies, it has values and drawbacks. Computational methods need to improve to use this data well. wfleabase.org/docs/arthropod-gene-finding/ Genome maps on your laptop Genome data sets that I use are available for your computer. Includes GMOD GBrowse software in a ready-to-run bundle* http://eugenes.org/gmod/genomeview-package2008/ * This is fully configured for Intel-MacOSX 10.5, others need further installation. See http://www.gmod.org/GBrowse Map data (large) are at ftp://eugenes.org/eugenes/gbrowse/databases/ daphnia_pulex : Daphnia genome data from wfleabase.org nasonia : Wasp gene predictions, homology, EST tribcas : Tribolium basic gene set from NCBI genomes drospege : 12 Drosophila genomes drosmel : Dros. mel rel 5.5 genome with Affymetrix transcriptome data wfleabase.org/docs/arthropod-gene-finding/ End note Acknowledgements I am grateful to support from NSF (DBI-0640462) and the NIH, including TeraGrid award for making this work possible. Daphnia sequencing and portions of the analyses were provided by DOE Joint Genome Institute and in collaboration with the Daphnia Genomics Consortium (DGC). References Gilbert, 2007. New and old genes in Drosophila genomes. http://insects.eugenes.org/DroSpeGe/about/analysis-doc/ Gilbert, 2007. Daphnia gene duplicates. http://wfleabase.org/genome-summaries/gene-duplicates/ Gilbert, 2008. Tandem genes lost + found. http://insects.eugenes.org/DroSpeGe/about/analysis-doc/ Manak, JR et al., 2006. .. unannotated transcription in Dros. mel. Nature Genetics, doi:10.1038/ng1875 Vieira, F.G. et al. 2007. .. analysis of the Odorant-Binding genes in Drosophila genomes. Genome Biology, doi:10.1186/gb2007-8-11-r235 wfleabase.org/docs/arthropod-gene-finding/