Download Rescuing hidden traces of evolution in the genomics era The

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Rescuing hidden traces of evolution in the genomics era
The evolutionary history of organisms is encrypted in their DNA. By comparing the DNA
sequence of today's organisms and tracing the changes that each organism has inherited from
common ancestors, phylogeneticists are able to reconstruct the evolutionary path of each
organism throughout history, depicted as an evolutionary tree.
The genomic era in which we live today has provided us with a deluge of DNA sequence data
thanks to the rapid development of massive sequencing technologies. With billions of bytes of
new data every month, scientists set out to solve the last remaining uncertainties in the
evolutionary tree of life.
However, it soon emerged that large amounts of data would not necessarily solve all
evolutionary uncertainties. Due to the inherent complexity of the evolutionary process, certain
patterns can be misinterpreted even when using our best methods, and even when analyzing
massive amounts of data. It was found that in some cases a large dataset could be more
misleading than a smaller subset simply due to limitations in our best methods of inferring past
evolutionary history.
One such case arises when the organisms that are being compared display highly different rates
of DNA evolution. Although this pattern arises due to natural reasons, it poses a challenge to our
current methods to infer evolutionary history. The reason lies in multiple nucleotide changes per
DNA position and in the misinterpretation of convergent characters as being inherited from a
common ancestor. In genomic era datasets, this misinterpretation is pervasive and thus, despite
large amounts of new sequence data, the evolutionary history of some species in the tree of life
remains obscured.
In the laboratory of Dr. Juan I. Montoya-Burgos of the Department of Genetics and Evolution
and Institute of Genetics and Genomics in Geneva (iGE3), researchers invented a method and
developed an algorithm to tackle this problem. The method, especially tailored for the large
sequence datasets of the genomic era, uses an objective criterion to measure how different the
evolutionary rates among the species are in each gene of a multi-gene dataset. With this
information, a subset of species evolving at a homogeneous rate can be identified for each gene,
and a large-scale dataset can be built in which misleading data has been removed.
The new algorithm, named Locus Specific Species Subsampling (LS³), was validated on
simulated DNA sequence data, a context in which the successful inference of the correct
evolutionary path can be measured. To prove the usefulness of the new LS³ method in biological
data, it was also applied to well-known DNA and protein sequence datasets in which
heterogeneous evolutionary rates among species misled the inference resulting in incorrect
evolutionary trees. In all cases, the LS³ algorithm succeeded in identifying problematic sequence
data, and removing these sequences containing misleading information resulted in the recovery
of the correct multi-gene evolutionary tree.
Developing such algorithms is a crucial step towards the full understanding of evolutionary
history in the midst of the genomic data deluge, filtering the useful information from the noise.
The LS³ algorithm provides the possibility of exploring the information contained in large
sequence datasets by acknowledging the limitations of our methods and working around them.
Carlos Rivera-Rivera and Juan Montoya-Burgos