Download 3_HKG_TREE_RECONSTRUCTION

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
3- NON-RIBOSOMAL GENE RECONSTRUCTION
 Core / auxiliary / strain specific genes
 Housekeeping genes and accordance with global reconstruction
 MLSA
 Alignment (aminoacid / nucleotide  depends on the level of
resolution)
 Filtering alignments
 Number of genes for a stable topology
 Horizontal gene transfer
 Tetranucleotide signatures
Housekeeping genes  alternative phylogenies
Auxiliary genes, not present
in all populations with low
phylogenetic signal
Core genes with phylogenetic
signal
Specific genes of a single
strain without phylogenetic
signal
Lan and Reeves. 2000 TRENDS Microbiol 8: 396-401
Housekeeping genes  not all give the same resolution
Characteristics of a molecule as molecular clock
 Universally present
 only 34 orthologous universal genes (Huynen & Bork, PNAS, 1998. 95:5849-5856)
 group specific (i.e. phylum, family, genus…) genes with phylogenetic signal can be used
 functional constancy
 sufficient sequence conservation for reconstruction purposes
 sufficient sequence complexity for a good phylogenetic signal
Markers supporting global phylogenies
Markers that do not support global phylogenies
 RNAr 16S
 RNAr 23S
 EF-Tu (some phyla are paraphyletic e.g.
Actinobacteria y Streptomyces)
 ATPases
 DNA girases
 Hsp70
 RecA
 RNA polimerase rpoB (some phyla are
paraphyletic e.g. Epsilonproteobacteria y resto
Proteobacteria)
 Heat Shock Hsp60 (Bacteria: GroEL, Archaea:
Tf-55; some may be paraphyletic)
 Aminoacyl tRNA sintetases
Ludwig and Schleifer. 2005 Microbial phylogeny and
evolution (Sapp) 70-98. (Oxford University Press)
PHYLOGENY BASED ON NON-RIBOSOMAL GENES: MLSA
Amplify and sequence 5-10
housekeeping genes for each strain
MLSA (multilocus sequence analysis):
 5-10 full/partial sequences
 house keeping genes
 primer design difficulties
Str. 1
Str. 2
Str. 3
Str. 4
genA
genB
genC
genD
genE
genF
Concatenate
gene sequences
 biases in the selection of genes
 time consuming
 ↓↓ number for stable topology
Reconstruct
the
phylogeny
Stackebrandt et al., 2002, Int J Syst Evol Microbiol. 52:846-849
Gevers et al., 2005, Nature Rev. Microbiol. 3:733-739
The alignments of proteins of genes are less clear than rRNA
Protein sequences vs. rRNA sequences
 Codifying DNA harbors information in triplets (codons)
 Degenerated code allows silent mutations (not much evolutionarily constraints)
 For deep phylogenies, amino acid alignments give better resolution.
 DNA phylogenies should only be done with close relative sequences
 Generally shorter sequences (300-1000 residues) than rRNA
Removing hypervariable positions  reducing phylogenetic noise
http://molevol.cmima.csic.es/castresana/Gblocks.html
Single genes may lead to different topologies
Ba cteroid ete s frag ilis YP_099886
74
A. PyrG
100
Ba cteroid ete s theta iotaom icron NP_809503
100
Prevotella inte rme dia PINA1923
98
Porphyromonas ging iva lis NP_904820
Cyto phaga hutchinsonii ZP_00311101
100
Chlorobium chlorocroma tii YP_378411
100
83
Chlorobium te pid um NP_661048
Geo ba cter sulfurre ducens NP_952944
99
Nitro somo na s europ ae a NP_841114
64
Rhodop irellula ba ltica NP_870297
Trep onema de nticola NP_952944
Oce anoba cillus iheyensis NP_693929
0.1
100
B. GlyA
84
Ba cteroidetes frag ilis YP_099485
Ba cteroidetes theta io tao micron NP_809651
Porphyromonas ging iva lis NP_904395
98
74
Prevotella inte rme dia PIN0101
Cyto phag a hutchinsonii ZP_00309740
Chlorobium te pid um NP_662473
100
Chlorobium chlorocroma tii YP380083
54
61
Geo ba cter sulfurre ducens NP_952658
Of all 22 analyzed genes:
57
Nitro somo na s euro pae a NP_841474
Rhodop irellula ba ltica NP_867123
58
Trep onema de nticola NP_973266
 57 % Bacteroidetes
 27 % Chlorobi
Oce anoba cillus iheye nsis NP_693907
0.1
100
C. GroEL
 18 % Chlorobi- Bacteroidetes
99
100
99
59
Ba cteroidete s theta iotaom icron NP_810742
Ba cteroidete s frag ilis YP_100673
Prevotella inte rme dia PINA1797
Porphyromonas ging iva lis NP_904815
Cyto phaga hutchinsonii ZP_00310575
100
Chlorobium chlorocroma tii YP_379609
Chlorobium te pid um NP_661430
One cannot rely on single gene
reconstructions that may produce inconsistent
results
57
33
17
Geo ba cter sulfurre ducens NP_954380
Nitro somo na s europ ae a NP_840129
Trep onema de nticola NP_971783
Rhodop irellula ba ltica NP_868643
0.1
Oce anoba cillus iheyensis NP_691577
The amount of genes in the concatenate influence the stability of the tree
random selection among the 22 genes
 checking branching robustness
Below 8 genes one can obtain
unstable topologies
12 genes gave the threshold for
reliability
For taxonomic purposes, 16S
rRNA gene sequence analysis is
the most parsimonious approach
Bootstrap
The bootstrap values improve
with the increase of amount of
genes in the analysis
100
90
80
70
60
50
40
30
20
10
0
4
8
12
16
Number of genes
Sória-Carrasco et al., 2008, System Appl Microbiol. 30:171-179
MLSA: phylogenetic reconstructions
MULTIPLE SEQUENCE ALIGNMENTS
 sometimes have better resolution than the
16S rRNA gene
 16S rRNA gene can have very low resolution
Jiménez et al., 2013, System Appl
Microbiol, 36: 383- 391
MONOPHYLY: phylogenetic reconstructions (MLSA)
MULTIPLE SEQUENCE ALIGNMENTS (LARGE SETS)
 r-MLST (ribosomal protein concatenates)
 SPECL (single copy marker genes)
 r-MLST (ribosomal protein concatenates)
 SPECL
 http://pubmlst.org/rmlst/
 (http://vm-lux.embl.de/~mende/specI/)
 Jolley et al., 2012, Microbiology 158:1005-15
 Mende et al., Nat Methods, in revision
 53 ribosomal protein genes (rps genes)
 on 40 universal, single copy marker genes
 Optimized cutoffs (96.5% nucleotide identity)
Loosing identity due to HGT
TWO SCHOOLS
Phylogenetic incongruences:
Phylogenetic incongruences:
HGT makes fuzzy the assignment of
identities
Can be explained by
Masive HGT in the microbial world
► gene duplication (paralogy) and
deletion (hidden paralogy)
No tree of life is possible
► false orthology assignation
► alignments artifacts
Orthology should be carefully checked
Soria-Carrasco & Castresana, 2008. Mol. Biol.
Evol. 25: 2319-2329
Kurland. 2005. Bioessays 27:741-747
Kunin et al. 2005. Genome Res. 15:954-959
pyrE
Some times no other explanation (either true or lack of information)
aroA
Some times a loss of phylogenetic signal
Genome Signatures
 G+C content ►dinucleotide ► not much
informative
 Codon usage ► trinucleotide ► more informative
 Tetranucleotides (penta-, hexa-…) ►more
information but more computing effort
Tetranucleotide variation: 44 = 256
TETRA:
 Genomes have an oligonucleotide usage (not yet
understood, related to codon usage)
 Similar genomes might have similar usage
 ALIGNMENT FREE PARAMETER
 may be useful in deciding whether a group of strains
deserve a species status
Contigs can be ordered by means of their tetranucleotide similarity
Probably fragments of
the same organism
High regression may indicate similar genome genetic codification
Teeling et al., 2004 Environ Microbiol. 6:938-947
Related documents