Download Complete genomes comparison based on the taxonomic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

G protein–coupled receptor wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Protein wikipedia , lookup

Gene expression wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Magnesium transporter wikipedia , lookup

Protein (nutrient) wikipedia , lookup

List of types of proteins wikipedia , lookup

Non-coding DNA wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Protein moonlighting wikipedia , lookup

Western blot wikipedia , lookup

Interactome wikipedia , lookup

Protein structure prediction wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein adsorption wikipedia , lookup

Community fingerprinting wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Complete genomes comparison based on the taxonomic
distribution of protein sequence homologs
Tatiana Tatusova, Alexander Souvorov, Roman Tatusov
National Center for Biotechnology Information
National Library of Medicine
National Institutes of Health
Bldg. 38A 8600 Rockville Pike, Bethesda, MD 20894
The field of microbial genomics has grown at astonishing rate since the first genome sequence
of Haemophilus influenzae was completed in 1995. Genome sequences of 51 microbial species
are currently available in public database. Completed microbial genome sequences represent a
collection of > 100,000 predicted coding sequences.
Examining the differences between protein sequences of various organisms gives insight into
the origin of genes and the relationship between species. A new tool for the comparison of
microbial genomes, called TaxPlot, provides a genome-wide approach to the study of gene and
protein functions. TaxPlot produces a 2D plot in which the predicted proteins of a query
organism are represented as points plotted with Cartesian coordinates (X,Y) equivalent to best
BLAST scores to predicted proteins from two other organisms. The analysis of protein
similarities between organisms gives insight into their evolutionary relationships.
Another approach combines protein similarity searching with taxonomic classification of the
detected homologs. A whole genome graphical overview shows the taxonomic distribution of
the highest scoring BLAST hit by three taxonomic groups, Eukaryota, Eubacteria and Archaea.
This approach also takes advantage of the COG (Clusters of the Orthologous Groups) system,
which includes conserved protein families represented in at least three phylogenetically distant
organisms with completely sequenced genomes. The proteins that comprise a COG (Clusters of
the Orthologous Groups) are displayed in a whole genome graphical overview and are linked
to the COG database. Individual protein alignment display integrates heterogeneous NCBI
resources offering a variety of display options, that include the distribution of hits by
taxonomic grouping, sorting by taxonomic proximity, the best hit to each organism, the protein
domains in the query sequence, similar sequences that have known 3-D structures, and more.