Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li Content Background Genome Evolution Phylogenetic Analysis Performing Statistical Tests Phylogenetic Networks Conclusion Phylogenetic Analysis Background Early attempts – Based on morphological characters Directly compare genes make more sense Modern attempts – Using sequences from individual homologous genes A gene’s evolutionary history might not the same as the evolutionary history of its organisms Some genes that are sufficiently conserved across all interested species might not be identified Genome Evolution Prokaryotes Eukaryotes Relatively Simple Prokaryote evolutionary history cannot properly be represented by a tree More complicated Frequent inversions of small segments, gene duplication and loss and polyploidy events Organellar Genomes Contain smaller and simpler mitochondrial genome Plant species have chloroplast genome Genome Evolution (cont.) Model of Genome Evolution Nadeau – Taylor Model 1,2,3,4,5,6,7,8,9,10 e Inv n rsio Inv e 1,2,3,4,-8,-7 -6,-5,9,10 rsio n 1,2,-6,-5,-4,-3,7,8,9,10 In v er si on 1,2,3,4,-8,-7,-6,-5,9,10 ro m Fis osom si o n e Ch 1,2,-6,4,5,-3,7,8,9,10 De let ion 1,2,-6,-5,-4,8,9,10 Phylogenetic Analysis – Binary Character Encoding Binary Character Encoding Encode the presence or absence of particular genes or protein families are obvious whereas gene order are not Many different approaches. Nature restriction A gene cannot adjacent to more than two others A evolutionary event will create two adjacent and break two Phylogenetic Analysis – Distance Methods Distance Methods Smallest number of evolutionary events between two gnomes Breakpoint Distance The distance between two genome with unequal content is a problem There are several software available for distance analysis Phylogenetic Analysis – Maximum Parsimony Try to find minimum tree is NP-hard Several attempts Find “breakpoint phylogeny” – Easier to find the maximum parsimony tree but still NP-hard Try to find the true maximum parsimony with improved algorithms and computing power Parsimony method has more advantages compared to distance methods But difficult to measure the accuracy of solutions Phylogenetic Analysis – Other Methods Maximum Likelihood Method of Invariants Computationally prohibitive Relies on having good estimates for the invariant function, which requires large dataset Bayesian Analysis The probability distributions involved can become extremely complicated Performing Statistical Tests Performing Statistical Tests for Phylogenetic features is not straight forward in any situation Re-sampling methods should preserve the gene order and should be used with caution since new error might introduced Phylogenetic Networks When dealing with whole genomes and in particular prokaryotic genomes we need phylogenetic networks Split graphs Reticulograms Can express uncertainty in a tree or a lack of faith in the tree model of evolution Not suitable for representing phenomena such as horizontal transfer or allopolyploid events Conclusion Comparison of gene content are becoming commonplace but comparison gene order present a wider range of problems It is important to focus on the data we already or will have Methods for whole genome phylogenetic analysis need to be robust against missing or inaccurate information