* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 2_16S_TREE_RECONSTRUCTION
Survey
Document related concepts
Transcriptional regulation wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression wikipedia , lookup
List of types of proteins wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Gene nomenclature wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression profiling wikipedia , lookup
Genome evolution wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Gene regulatory network wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Point mutation wikipedia , lookup
Molecular evolution wikipedia , lookup
Transcript
3- RIBOSOMAL RNA GENE RECONSTRUCITON Phenetics Vs. Cladistics Homology/Homoplasy/Orthology/Paralogy Evolution Vs. Phylogeny The relevance of the alignment The algorithms Bootstrap One tree is no tree Phylogenetic coherence (monophyly) phylogenetic coherence genomic coherence phenotypic coherence 50% 60% 70% 70-50% 70% 80% 100% RNAr 16S Functional genes (MLSA) Genomic analyses Reasociación DNA-DNA G+C, AFLP, MLSA Genomic comparisons (ANI; AAI) metabolism chemotaxonomy spectrometry (Maldi-Tof; ICR-FT/MS) Generally based on 16S rRNA gene analysis important to recognize the closest relatives by means of the Type Strain gene sequences Housekeeping genes (MLSA approach or single gene) may help in resolve phylogenies Future perspectives will be done with full-genome sequences Phenetics vs Cladistics Data can be treated as presence/absence/intensity to generate similarity matrices If data is analyzed by their similarity PHENETICS If data is analyzed in an evolutionary context (i.e. changes in homologous characters are mutations or evolutive steps) CLADISTICS Similarity matrix or alignment For evolutive purposes is necessary to recognize HOMOLOGY PHENETICS 80 85 90 OTU A 10100010010010010 OTU B 11010001010001010 OTU C 00010010011110101 OTU D 00111110010101010 OTU E 00010010111001101 … M8 M31 A1 M1 A7 P13 P18 PR1 C12 C16 E3 E11 C9 C4 C5 C25A E7 CLADISTICS HOMOLOGY ORTOLOGY PARALOGY HOMOPLASY Homology same ancestral origin Organism A Gene X Homoplasy false homology Organism B Gene X Orthology homologous genes in different organisms Organism A Gene X Gene X’ Gene X’’ Paralogy homologous genes in the same organism, gene duplications with identical or different function HOMOLOGY ORTOLOGY PARALOGY HOMOPLASY Homoplasy (false homology) Organism A Gene X Organism B Gene X Orthology homologous genes in different organisms Homology (same ancestral origin) Organism A Gene X Gene X’ Gene X’’ Paralogy homologous genes in the same organism, gene duplications with identical or different function Evolution vs. Phylogeny Evolution => mutations (morphometrics) + age (fossil record) Phylogeny = genealogy => we know only the tips of the tree, nothing is said about putative ancestors Evolution ≠ phylogeny PROKARYOTES => no fossil record => molecular clocks Molecular clocks (housekeeping genes): 16S rRNA; 23S rRNA; ATPases; TU-elongation factor; gyrases… The 16S rRNA: Universally represented Conserved No protein coding Base pairing (helix) Natural amplification Proper size Ludwig and Schleifer, 1994 FEMS Rev 15:155-173 The relevance of the alignment To perform cladistic analyses we should first align al sequences in order to recognize all homologous positions. Recognition by: Sequence similarities Base pairing due secondary structure (helixes for rRNA) Insertions & deletions Empirically (subjective) Minimize homoplasic influences There are many alignment programs, all look to common features that may indicate homologous sites: Clustal X MAFFT PileUp … The relevance of the alignment Most of the programs do not take into account secondary structure, just sequence motive similarities rRNA has a secondary structure with helixes that help in aligning sequences Functional gene or translated proteins cannot be improved by secondary structure analysis The relevance of the alignment www.arb-home.de www.arb-silva.de ARB does take into account features as helix pairing By increasing the numbers of sequences, the alignment improves The algorithms b c c a a b Like Maximum Parsimony but takes into account dendrograms alignment Distance transformation a => 0 a => 100 b => 40 0 b => 60 100 Jukes-Cantor c => 60 20 0 c => 40 80 100 Kimura a b c a b c De Soete Distance matrix Similarity matrix (pitfalls: does not take into account multiple mutations) Maximum Parsimony G C C A T => a G C A C T => b G C A C C => c a b 2 b – c => 1 mutation a c 3 b c b 1 3 2 2 5 5 3 a – b => 2 mutations a – c => 3 mutations c Maximum Likelihood (pitfalls: nature may not be parsimonious) a difficulties in mutation events (transitions vs. transversions) mutation position Slower transitions transversions Neighbor Joining: G C C A T => a G C A C T => b G C A C C => c T C A G Bootstrap Bootstrap indicates how stable is a branching order when a given dataset is submitted to multiple analysis Generally short internode branches will have low bootstrap values TERMINI 42,284 homologous positions PHYLOGENETIC FILTERS BACTERIA 1,532 homologous positions 30% 1,433 homologous positions 50% 1,288 homologous positions NJ_bac USE OF PHYLOGENETIC FILTERS Conservational filters are useful for deepbranching phylogenies complete sequences are useful for close relative organisms NJ_30% NJ_50% Size & information content complete sequences give complete information partial sequences lose phylogenetic signal short sequences lose resolution 1500 nuc 300 nuc 900 nuc One tree is no tree different algorithms different topologies try different datasets as well draw a consensus tree RaXML NJ PAR RECOMMENDATIONS FOR 16S rRNA TREE RECONSTRUCTION SEQUENCE almost complete is better than short partial sequences ALIGNMENT Better take into account secondary structures ALGORITHM Better maximum likelihood, but compare with other as neighbor joining and maximum parsimony DATASET Never just one dataset, try different sets of data (i.e. different number of sequences; different filters to find the best resolution) FINAL TREE Either you show all trees, or the best bootstrapped, or a multifurcation showing unresolved branching order. B E C A G H I F D A 95 50 25 B E C D F G H I 100 90 25 100 100 Tree with bootstrap Tree with multifurcation MLSA: phylogenetic reconstructions MULTIPLE SEQUENCE ALIGNMENTS sometimes have better resolution than the 16S rRNA gene 16S rRNA gene can have very low resolution Jiménez et al., 2013, System Appl Microbiol, 36: 383- 391