Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
3- NON-RIBOSOMAL GENE RECONSTRUCTION Core / auxiliary / strain specific genes Housekeeping genes and accordance with global reconstruction MLSA Alignment (aminoacid / nucleotide depends on the level of resolution) Filtering alignments Number of genes for a stable topology Horizontal gene transfer Tetranucleotide signatures Housekeeping genes alternative phylogenies Auxiliary genes, not present in all populations with low phylogenetic signal Core genes with phylogenetic signal Specific genes of a single strain without phylogenetic signal Lan and Reeves. 2000 TRENDS Microbiol 8: 396-401 Housekeeping genes not all give the same resolution Characteristics of a molecule as molecular clock Universally present only 34 orthologous universal genes (Huynen & Bork, PNAS, 1998. 95:5849-5856) group specific (i.e. phylum, family, genus…) genes with phylogenetic signal can be used functional constancy sufficient sequence conservation for reconstruction purposes sufficient sequence complexity for a good phylogenetic signal Markers supporting global phylogenies Markers that do not support global phylogenies RNAr 16S RNAr 23S EF-Tu (some phyla are paraphyletic e.g. Actinobacteria y Streptomyces) ATPases DNA girases Hsp70 RecA RNA polimerase rpoB (some phyla are paraphyletic e.g. Epsilonproteobacteria y resto Proteobacteria) Heat Shock Hsp60 (Bacteria: GroEL, Archaea: Tf-55; some may be paraphyletic) Aminoacyl tRNA sintetases Ludwig and Schleifer. 2005 Microbial phylogeny and evolution (Sapp) 70-98. (Oxford University Press) PHYLOGENY BASED ON NON-RIBOSOMAL GENES: MLSA Amplify and sequence 5-10 housekeeping genes for each strain MLSA (multilocus sequence analysis): 5-10 full/partial sequences house keeping genes primer design difficulties Str. 1 Str. 2 Str. 3 Str. 4 genA genB genC genD genE genF Concatenate gene sequences biases in the selection of genes time consuming ↓↓ number for stable topology Reconstruct the phylogeny Stackebrandt et al., 2002, Int J Syst Evol Microbiol. 52:846-849 Gevers et al., 2005, Nature Rev. Microbiol. 3:733-739 The alignments of proteins of genes are less clear than rRNA Protein sequences vs. rRNA sequences Codifying DNA harbors information in triplets (codons) Degenerated code allows silent mutations (not much evolutionarily constraints) For deep phylogenies, amino acid alignments give better resolution. DNA phylogenies should only be done with close relative sequences Generally shorter sequences (300-1000 residues) than rRNA Removing hypervariable positions reducing phylogenetic noise http://molevol.cmima.csic.es/castresana/Gblocks.html Single genes may lead to different topologies Ba cteroid ete s frag ilis YP_099886 74 A. PyrG 100 Ba cteroid ete s theta iotaom icron NP_809503 100 Prevotella inte rme dia PINA1923 98 Porphyromonas ging iva lis NP_904820 Cyto phaga hutchinsonii ZP_00311101 100 Chlorobium chlorocroma tii YP_378411 100 83 Chlorobium te pid um NP_661048 Geo ba cter sulfurre ducens NP_952944 99 Nitro somo na s europ ae a NP_841114 64 Rhodop irellula ba ltica NP_870297 Trep onema de nticola NP_952944 Oce anoba cillus iheyensis NP_693929 0.1 100 B. GlyA 84 Ba cteroidetes frag ilis YP_099485 Ba cteroidetes theta io tao micron NP_809651 Porphyromonas ging iva lis NP_904395 98 74 Prevotella inte rme dia PIN0101 Cyto phag a hutchinsonii ZP_00309740 Chlorobium te pid um NP_662473 100 Chlorobium chlorocroma tii YP380083 54 61 Geo ba cter sulfurre ducens NP_952658 Of all 22 analyzed genes: 57 Nitro somo na s euro pae a NP_841474 Rhodop irellula ba ltica NP_867123 58 Trep onema de nticola NP_973266 57 % Bacteroidetes 27 % Chlorobi Oce anoba cillus iheye nsis NP_693907 0.1 100 C. GroEL 18 % Chlorobi- Bacteroidetes 99 100 99 59 Ba cteroidete s theta iotaom icron NP_810742 Ba cteroidete s frag ilis YP_100673 Prevotella inte rme dia PINA1797 Porphyromonas ging iva lis NP_904815 Cyto phaga hutchinsonii ZP_00310575 100 Chlorobium chlorocroma tii YP_379609 Chlorobium te pid um NP_661430 One cannot rely on single gene reconstructions that may produce inconsistent results 57 33 17 Geo ba cter sulfurre ducens NP_954380 Nitro somo na s europ ae a NP_840129 Trep onema de nticola NP_971783 Rhodop irellula ba ltica NP_868643 0.1 Oce anoba cillus iheyensis NP_691577 The amount of genes in the concatenate influence the stability of the tree random selection among the 22 genes checking branching robustness Below 8 genes one can obtain unstable topologies 12 genes gave the threshold for reliability For taxonomic purposes, 16S rRNA gene sequence analysis is the most parsimonious approach Bootstrap The bootstrap values improve with the increase of amount of genes in the analysis 100 90 80 70 60 50 40 30 20 10 0 4 8 12 16 Number of genes Sória-Carrasco et al., 2008, System Appl Microbiol. 30:171-179 MLSA: phylogenetic reconstructions MULTIPLE SEQUENCE ALIGNMENTS sometimes have better resolution than the 16S rRNA gene 16S rRNA gene can have very low resolution Jiménez et al., 2013, System Appl Microbiol, 36: 383- 391 MONOPHYLY: phylogenetic reconstructions (MLSA) MULTIPLE SEQUENCE ALIGNMENTS (LARGE SETS) r-MLST (ribosomal protein concatenates) SPECL (single copy marker genes) r-MLST (ribosomal protein concatenates) SPECL http://pubmlst.org/rmlst/ (http://vm-lux.embl.de/~mende/specI/) Jolley et al., 2012, Microbiology 158:1005-15 Mende et al., Nat Methods, in revision 53 ribosomal protein genes (rps genes) on 40 universal, single copy marker genes Optimized cutoffs (96.5% nucleotide identity) Loosing identity due to HGT TWO SCHOOLS Phylogenetic incongruences: Phylogenetic incongruences: HGT makes fuzzy the assignment of identities Can be explained by Masive HGT in the microbial world ► gene duplication (paralogy) and deletion (hidden paralogy) No tree of life is possible ► false orthology assignation ► alignments artifacts Orthology should be carefully checked Soria-Carrasco & Castresana, 2008. Mol. Biol. Evol. 25: 2319-2329 Kurland. 2005. Bioessays 27:741-747 Kunin et al. 2005. Genome Res. 15:954-959 pyrE Some times no other explanation (either true or lack of information) aroA Some times a loss of phylogenetic signal Genome Signatures G+C content ►dinucleotide ► not much informative Codon usage ► trinucleotide ► more informative Tetranucleotides (penta-, hexa-…) ►more information but more computing effort Tetranucleotide variation: 44 = 256 TETRA: Genomes have an oligonucleotide usage (not yet understood, related to codon usage) Similar genomes might have similar usage ALIGNMENT FREE PARAMETER may be useful in deciding whether a group of strains deserve a species status Contigs can be ordered by means of their tetranucleotide similarity Probably fragments of the same organism High regression may indicate similar genome genetic codification Teeling et al., 2004 Environ Microbiol. 6:938-947