Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Cre-Lox recombination wikipedia , lookup
DNA barcoding wikipedia , lookup
Expanded genetic code wikipedia , lookup
Molecular ecology wikipedia , lookup
Genomic library wikipedia , lookup
Genetic code wikipedia , lookup
Non-coding DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Community fingerprinting wikipedia , lookup
59º Congresso Brasileiro de Genética Resumos do 59o Congresso Brasileiro de Genética • 16 a 19 de setembro de 2013 Hotel Monte Real Resort • Águas de Lindóia • SP • Brasil www.sbg.org.br - ISBN 978-85-89109-06-2 31 Phylogenetic inference of bacterial evolutionary relationship from the analysis of genomic signature using Singular Value Decomposition (SVD) Castro-Oliveira, L1; Amorim, LG1; Mariano, DCB1; Santos, MA1; Soares, SC2; Miyoshi, A2; Azevedo, V1,2 Programa de Pós-Graduação em Bioinformática - ICB, UFMG, Belo Horizonte, MG; 2Programa de Pós-Graduação em Genética - ICB, UFMG, Belo Horizonte, MG 1 [email protected] Keywords: Phylogeny, genomic signature, SVD, CMNR group, MATLAB Evolutionary reconstructions of the tree of life were mainly performed based in identification of the point of divergence between species solely based in shared homologous features. However, this methodology could be very tricky due to convergent and divergent evolution. With the advent of molecular techniques, phylogenetics was greatly improved by the use of nucleotide differences in universal reference markers, creating the area of phylogenomics. In the postgenomic era, a second wave of changes brought new approaches to phylogenomics, which now infers the evolutionary divergence by taking advantage of whole-genome data, like: gene content and gene order; orthology; and, DNA string or DNA signature. Phylogenomics inferences based on DNA signature, or genomic signature, take into account the codon usage of the coding sequences, the G+C content and the nucleotide pattern, like di-, tri- and tetra-nucleotides frequencies. The codon usage is mainly affected by the codon/anticodon interaction force and the availability of a given tRNA, where the adoption of AT- or GC-rich codons generates a homogeneous nucleotide pattern through the whole genome, which is different in unrelated organisms. In this work, we analyzed the latent semantic index based on the singular value decomposition (LSI-SVD) of a matrix containing information from the codon usage fraction of the coding sequences (CDS). The resulting data was used as coordinates to plot the genomes in a 3-dimensional chart and a distance matrix was generated from the absolute distances between all genomes in the Matlab® software. Finally, a phylogenetic tree was created from the distance matrix in order to visualize the evolutionary relationships. The dataset was composed of 65 genomes of Gram-positive and Gram-negative bacteria, and the resulting phylogenetic tree was validated using the already studied evolutionary relationships of the bacteria from the CMNR group (Corynebacteria, Mycobacteria, Nocardia and Rodococcus). The phylogenetic tree generated by this method shows a clear relationship between the bacteria of those genera, in spite of the other organisms; however, a small number of species appear in disagreement. Regarding the high G+C content of the bacteria from the CMNR group, the dataset is under update in order to consider nucleotide frequencies (G+C, di-, tri- and tetra-nucleotides), which will separate the CMNR group from other bacteria and raise the accuracy of the method. Finally, we intend to develop a public software applying the methodology here used and extend the phylogenetic analysis to other bacterial genomes from NCBI. Financial Support: CAPES, CNPq e FAPEMIG.