Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Rescuing hidden traces of evolution in the genomics era The evolutionary history of organisms is encrypted in their DNA. By comparing the DNA sequence of today's organisms and tracing the changes that each organism has inherited from common ancestors, phylogeneticists are able to reconstruct the evolutionary path of each organism throughout history, depicted as an evolutionary tree. The genomic era in which we live today has provided us with a deluge of DNA sequence data thanks to the rapid development of massive sequencing technologies. With billions of bytes of new data every month, scientists set out to solve the last remaining uncertainties in the evolutionary tree of life. However, it soon emerged that large amounts of data would not necessarily solve all evolutionary uncertainties. Due to the inherent complexity of the evolutionary process, certain patterns can be misinterpreted even when using our best methods, and even when analyzing massive amounts of data. It was found that in some cases a large dataset could be more misleading than a smaller subset simply due to limitations in our best methods of inferring past evolutionary history. One such case arises when the organisms that are being compared display highly different rates of DNA evolution. Although this pattern arises due to natural reasons, it poses a challenge to our current methods to infer evolutionary history. The reason lies in multiple nucleotide changes per DNA position and in the misinterpretation of convergent characters as being inherited from a common ancestor. In genomic era datasets, this misinterpretation is pervasive and thus, despite large amounts of new sequence data, the evolutionary history of some species in the tree of life remains obscured. In the laboratory of Dr. Juan I. Montoya-Burgos of the Department of Genetics and Evolution and Institute of Genetics and Genomics in Geneva (iGE3), researchers invented a method and developed an algorithm to tackle this problem. The method, especially tailored for the large sequence datasets of the genomic era, uses an objective criterion to measure how different the evolutionary rates among the species are in each gene of a multi-gene dataset. With this information, a subset of species evolving at a homogeneous rate can be identified for each gene, and a large-scale dataset can be built in which misleading data has been removed. The new algorithm, named Locus Specific Species Subsampling (LS³), was validated on simulated DNA sequence data, a context in which the successful inference of the correct evolutionary path can be measured. To prove the usefulness of the new LS³ method in biological data, it was also applied to well-known DNA and protein sequence datasets in which heterogeneous evolutionary rates among species misled the inference resulting in incorrect evolutionary trees. In all cases, the LS³ algorithm succeeded in identifying problematic sequence data, and removing these sequences containing misleading information resulted in the recovery of the correct multi-gene evolutionary tree. Developing such algorithms is a crucial step towards the full understanding of evolutionary history in the midst of the genomic data deluge, filtering the useful information from the noise. The LS³ algorithm provides the possibility of exploring the information contained in large sequence datasets by acknowledging the limitations of our methods and working around them. Carlos Rivera-Rivera and Juan Montoya-Burgos