Download CoevolPaper2 - University of Illinois Archives

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene wikipedia , lookup

Primary transcript wikipedia , lookup

History of genetic engineering wikipedia , lookup

Protein moonlighting wikipedia , lookup

Genomics wikipedia , lookup

Koinophilia wikipedia , lookup

Nucleic acid tertiary structure wikipedia , lookup

RNA-Seq wikipedia , lookup

Microevolution wikipedia , lookup

Genome evolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

History of RNA biology wikipedia , lookup

Pathogenomics wikipedia , lookup

Non-coding RNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Metagenomics wikipedia , lookup

Epitranscriptome wikipedia , lookup

Ribosome wikipedia , lookup

Transcript
Molecular Signatures of
the Past
Anurag Sethi, Elijah Roberts, Jonathan Montoya, Evan Rosenfeld, Carl R. Woese, Zaida Luthey-Schulten
February 12, 2007
Abstract
Evolutionary analysis of the translational apparatus indicates that it was highly developed at the emergence of
the modern cell and before this bacterial Darwinian transition, the translational apparatus was dominated by
extensive horizontal gene transfer. The root of the universal phylogenetic tree repre-sents an evolutionary phase
transition that led to the dominance of vertical gene transfer over horizontal gene transfer as the main mode of
evolution of the translational apparatus. Despite sharing a common ancestry, there are significant signatures that
distinguish the ribosomal RNA (rRNA) and ribosomal pro-teins (r-proteins) among the three domains of life that are
remnants of this transition. In this study, we provide evidence that the signatures in the rRNA and the r-proteins,
which are often in close proximity, have coevolved with each other, albeit at different rates. An example is the
insertion in helix H16 on the bacterial 16S rRNA that interacts and coevolves with a bacterial insertion in the rprotein S4. The rRNA signature region in helix H16 is more structurally conserved than its r-protein counterpart,
which may be an indication that the increased conformational flexibility available to proteins allows their evolution
to fine-tune the ribosome for different environments. Our analysis indicates that the archaeal/eukaryal ribosomal
protein L18e was likely present in the universal communal state and has coevolved with the archaeal/eukaryal
signature H34a on the 23S rRNA, providing evidence that domain specific proteins should not be considered late
inventions. In addition, the structure and sequence-based phylogenies of the 23S rRNA gives the typical canonical
groupings seen in 16S rRNA phylogeny, even when the struc-tural signature regions are excluded, showing that the
development of the ribosomal signatures is not just a peripheral event, but induced changes in the structural core.
From these studies we propose that most of the signatures developed after and are remnants of the Darwinian
transition that occurred as the cell lineages began to coalesce.
1 Introduction
A now huge and exponentially increasing database regarding the molecular makeup of cells has accumulated over the last several
decades. Biologists today routinely ask questions of these data that are far more detailed than was previously possible.. What is not
generally appreciated, however, is that large data sets of this type tend to bring into question the very conceptual framework within
which the questions themselves are posed. An especially informative example is our understanding of the cellular translation
mechanism. In the past the mechanism was conceptualized and probed in a reductionist, ”particle” framework; but understanding
today comes increasingly from questions concerning energy states, modal analyses, resonance considerations, and the like. The
questions and answers bespeak a highly integrated mechanism, whose essence does not lie in a particle-based perspective. The older
questions, motivated by that perspective, tend now to yield at best supeficial explanations. The real understanding of translation would
seem to lie in discovering the mechanism’s delocalized, collective properties.
This perceptual change obviously applies not only to the translational process, but embraces all biological organization, all things
biological. Ultimate explanations in biology will come largely in terms of processes, not material substance. A process perspective
unavoidably leads back to the dynamics of evolution, the process that in the end gives rise to all the subordinate biological processes
that constitute what we take to be biology today. The process of evolution isa forteriori non-uniform. While this sporadic nature of
evolution can be glimpsed throughout the fabric of the cell, perhaps its clearest markings can be seen in the sequence and structural
signatures of the translation apparatus, i.e., the ribosome and the amino acid charging enzymes.
Evidence today strongly suggests that a highly developed translation system was a necessary condition for the emergence of cells
as we know them [Woese, 2002]. In a universal phylogenetic tree (UPT) format this maturation of the translation system seems to be
represented by the tree’s basal branchings, where first the bacterial and then the archaeal and eukaryotic lineages appear individually to
emerge. What lies beneath this ”root” locus, the evolution leading up to it, cannot be captured in familiar tree representation. It would
seem to be some distributed universal ancestral state from which the (three) primary organismal lineages emerge via one or a brief
series of major evolutionary saltations (called ”Darwinian transition(s)”); in which the state of the evolving cellular organization
undergoes dramatic change and with it the accom-panying evolutionary dynamic. The aboriginal evolutionary dynamic may have been
”Lamarkian” in that it appears likely to have involved massive, pervasive horizontal transfer/acquisition of genes (information,
memory, what-have-you); which goes by the acronym HGT. The kind and frequency of the HGTenvisioned
would make evolution early on effectively communal. This communal evolutionary dynamic comes to an end, relatively suddenly
transforms largely into the familiar genealogical dynamic, when various organismal elements in the community reach critical stages
wherein their organizations drastically change (undergo Darwinian transitions) [Woese, 2002, Vetsigian et al., 2006]. Certain
signatures in the ribosome, i.e., id-iosyncrasies in its RNA (rRNA) and/or proteins (r-proteins) characteristic of the individual domains
of life [Woese, 1987, Gutell and Woese, 1990, Winker and Woese, 1991, Cannone et al., 2002] seem to be telling reflections of such
phase transitions.
There appear to be two general kinds of these signatures, the small scale, in which overall structure of a given locale does not
appreciably change, but particular (constant and characteristic) sequence variations occur within the given structural element; and the
large scale, in which one can (by definition) see signficant secondary and tertiary structure variations that are characteristic of a
particular domain of life.
The cellular translation machinery is the quintessential example of biological process frozen in time.
As such it provides a powerful system that can be teased, through comparative analysis, into revealing the processes that give rise to
the evolution of the cell. Herein we begin a multi-dimensional comparative dissection of this most interesting of biology’s ”frozen
processes”.
Because the evolutionary relationships between the 16S rRNAs of the small ribosomal subunit (SSU) of different organisms were
initially used to infer the UPT [Woese et al., 1990], the 16S rRNA has become the traditional molecular standard in classifying life,
which seems the main reason why the small ribosomal subunit (SSU), which houses the 16S rRNA, has come in for so much more
study than its counterpart, the large subunit (LSU), containing the larger 23S rRNA [Ludwig and Schleifer, 1994, Trust et al., 1994].
With the publications of the atomic structures of the LSU from an archaeon and several bacterial organisms [Klein et al., 2004,
Schuwirth et al., 2005, Korostelev et al., 2006, Selmer et al., 2006], it is now possible to perform comparative studies of both the
sequences and structures of the 23S rRNA leading to the identfication and characterization of signatures on the 23S rRNA as well.
1.1 Coevolution of r-proteins and rRNA
In vitro kinetic studies have shown that the assembly and stability of the SSU and LSU are facilitated by the binding of r-proteins to
the rRNA [Talkington et al., 2005, Rohl and Nierhaus, 1982]. Given physical interactions between the proteins and RNA, there should
be signs that changes in the one can may be compensated by changes in the other. Many of the signature regions of both the SSU and
LSU rRNA are associated with distinct r-proteins, distributed either universally or in only a specfic domain of life. The
covariation analyses presented below attempts to identify correlating signatures for the organismal domains that appear in both the
rRNA and the corresponding proteins.
Comparative analysis of the available sequence and structural data allows us to infer what features existed in the gene pool at the
time the various Darwinian transitions (to the vertical, or genealogical, mode of evolution) occurred in the three primary organismal
lineages. The universally distributed r-proteins exhibit what is called the full canonical pattern [Woese et al., 2000], wherein the
various taxa group into three distinct clusters (bacteria, archaea, eukaryotes), with the latter two showing the most structure and
sequence similarity. While the canonical pattern provides evidence that the universal r-proteins were present at the so-called base of
the UPT, the situation is less clear with regard to the proteins specific for the individual domains (ds proteins). Due to the lack of a
systematic phylogenetic analysis, the question remains as to whether these proteins are relatively recent inventions, because of their
localization in a particular domain, or whether they might already be in existence at the time of the universal communal state.
Using a variety of techniques, this work will investigate one particular collective property of the ribosome: the evolution of the
molecular signatures. We herein utilize our recently developed evolutionary analysis tools for comparing sequences and structures of
both proteins and nucleic acids [Roberts et al., 2006], we look at coevolution of a particular universal r-protein with its corresponding
locale in the 16S rRNA, compare the sequence and structure based phylogenetic trees for bacterial and archaeal 23S rRNA, and
reconstruct the evolutionary history of a domain specific r-protein. While the appearance of signatures of the organismal domains is to
be expected in both the rRNAs and the r-proteins, it is not clear whether signatures of the bacterial Darwinian transition also exist
elsewhere. Using the available genomic context data and analysis tools, the operonal organization is examined for further signs of
domain specific proteins. This work and its future extension will have far reaching implications concerning the evolution of the cell in
general.
2 Results
2.1 Coevolution of 16S rRNA and universal r-proteins
The large amount of sequence and structure data available for the ribosome can be used to create evolutionary profiles [Sethi et al.,
2005] of both the rRNA and the r-proteins. It is also possible to unambiguously identify the binding sites of the r-proteins on the rRNA
using the high resolution crystal structures of the ribosome. The combination of evolutionary profiles and binding contacts makes
possible a comprehensive analysis of the coevolution between the rRNA and the r-proteins. While some residues at the interface of the
rRNA and
the r-proteins are evolutionarily conserved. However, others vary and co-variances between certain ones of them will then reveal their
functional inter-relationships. In this study, a mutual information analysis (see Methods) was performed to find possible sets of
coevolving residues. For example, this analysis provides evidence that one of the strongest molecular signatures of the 16S rRNA
distinguishing archaea and bacteria, helix H16, coevolves with the r-protein, S4, that interacts with it.
Ribosomal protein S4 is a large ( 200 residue) two domain universal r-protein that is a primary binder in the 30S SSU assembly
map [Mizushima and Nomura, 1970, Held et al., 1974] and one of the assembly-initiator proteins that seed the formation of active 30S
subunits in vitro [Nowotny and Nierhaus, 1988]. S4 binds to helices H3, H4, H16, H17, H18, and H20 of the 16S rRNA and interacts
with the 530 pseudoknot (Figures ?? and S1). Mutations in S4 are known to reduce the accuracy of translation [Carter et al., 2000].
Our mutual information analysis of S4 and the 16S rRNA identified the N-terminal domain of the protein as coevolving with helix
H16 of the 16S rRNA. The N-terminal domain of bacterial S4 contains an insertion of about 10-12 residues relative to the archaeal S4
protein, and the corresponding bacterial H16 in the rRNA contains a recognized bacterial signature consisting of a 5 base pair
extension to the helix and a bulge loop. The secondary structure of this region is conserved throughout the bacteria. It is the bacterial
specific insertion in S4 only that makes contact with the extended rRNA helix, presumably stabilizing it (Figure ??). The presence of
both rRNA and protein insertions in all bacterial groups and their absence in all archaeal lineages indicates that the protein and rRNA
coevolved in the bacterial lineage subsequent to its Darwinian transition.
Although the bacterial insertion is too small to reliably determine a detailed phylogenetic history, useful information can be
discerned from simply looking at the grouping of the insertion’s sequences. Figure ?? shows the sequence and an approximate
phylogeny of the insertion for the main classes of bacteria. The insertion clearly divides into two main types: a zinc binding motif (type
I) containing four cysteines (this region can be identified as zinc binding from the Thermus thermophilus structures) and a non-zinc
binding version (type II). These two types of insertions can be further subdivided into a number of subtypes. An important feature of
this tree is that the subtypes correspond closely with bacterial classes even though the relationships between these subtypes do not
correspond to the canonical bacterial phylogeny, while the C-terminal region of S4 shows the bacterial canonical pattern. This implies
that the protein insertion was still in flux and subject to intense innovation sharing after the bacterial Darwinian transition.
5
2.2 Evolution of Signatures in the 23S rRNA
While phylogenetic analyses of 16S rRNA sequences have been widely used as a molecular taxonomic measure, studies of the 23S
rRNA have been evolutionarily confined to a small number of taxa. Yet the 23S rRNA should prove at least as informative of the
evolution of the ribosome and translation in general as is its smaller counterpart.
Structures of the 23S rRNA from four organisms were aligned and a measure of structural similarity, QH, used to construct a
structure-based phylogenetic tree. QHwas originally developed to investigate the congru-ence of structure and sequence-based
phylogenies of the AARSs [O’Donoghue and Luthey-Schulten, 2003, O’Donoghue and Luthey-Schulten, 2005]. It has been adapted
here for use with nucleic acids. It includes a term for the core aligned regions and a gap penalty that takes into account the perturbation
of the core due to insertions. The phylogenetic tree of the structures in the commonly resolved regions (see Methods), shown in Figure
??, shows a deep divergence between the archaeal 23S rRNA structure of Haloarcula marismortuii and the bacterial structures of
Escherichia coli, Thermus thermophilus, and Deinococcus radioduransas is seen between the two domains of life in both the 16S
rRNA tree and the sequence-based phylogeny of the 23S rRNA. In addition, the structure-based phylogenetic tree confirms the specific
relationship between T. thermophilus and D. radiodurans. In other words, the evolutionary relationship captured in the structures of
23S rRNA is congruent with the corresponding sequence-based phylogeny.
Due to the crystallization procedure, a combination of effects such as different conformational states, difference in crystallization
conditions, and/or the lower resolution structures, can lead to noise in the structural phylogenetic analysis. As a result, sequence-based
phylogenetic methods yield a more accurate phylogenetic branching than structure-based methods for the higher order groupings
between organisms in the same genus or same phylum. However, structure-based alignments can be used to improve the sequencebased alignments of distant organisms in variable regions or signature regions as the sequence homology in some of these regions is
very low while the structure remains highly conserved.
A striking feature of this study is that the vast majority of the 23S rRNA structure remains conserved across the bacterial and
archaeal domains. While the largest contribution to the structural divergence between the archaeal and bacterial 23S rRNAs is from
insertions or deletions that are archaeal/bacterial specific, the evolutionary analysis of the four structures in the absence of the
corresponding structural signa-tures indicates that the same groupings seen in Figure ?? remain. The structural signatures
corresponding to change in sequence and secondary structure of the 23S rRNA have been identified and marked on the 23S rRNA
secondary structure (Figures S3 to S6). While the core of 16S rRNA and 23S rRNA has been
6
identified using comparative sequence analysis [Mears et al., 2002], the 3-dimensional mapping of the core of the ribosome will only
be complete once well-resolved structures of representative mitochondrial and cy-toplasmic LSUs of eukaryotes are available. While
some of these signatures are inserted at the periphery of the LSU structure, their conservation in their respective domains of life
indicate that they may play a role in the stability and possibly, the allostery involved in translation. There are additional signatures in
the 23S rRNA close to the 5S rRNA and the E-site tRNA binding regions indicating that these changes might have an effect on
translation across the domains of life. An analysis of the signatures indicates that most of them occur close to regions interacting with
ds-proteins.
2.3 Domain Specific r-Proteins of the LSU
While approximately half of the r-proteins are known to be universally distributed (universal r-proteins), the other half are specific to
one or two domains of life [Lecompte et al., 2002]. In bacteria, the LSU and SSU contain 14 and eight ds-proteins respectively. In the
archaea, the comparable numbers are 21 (LSU) and 13 (SSU) [Lecompte et al., 2002].
The structural overlap of the LSUs from T. thermophilus and H. marismortui based on the alignment of their 23S rRNA (PDB IDs
1S72 and 2J01 see Figure ??) allows us to equate ds-proteins [Klein et al., 2004, Mushegian, 2005] (occupying similar regions of the
rRNA) that have little or no sequence homology. We refer to such proteins as spatial analogues (Table S1). In spite of their lack of
sequence or structural homology, one can often see the two forming similar contacts with the RNA, suggestive of a similar functional
role. The existence of such spatial analogues seems to imply that specific protein-RNA contacts and the electrostatic environment are
more important in defining function than are the globular features of the r-protein.
Not all ds-proteins have spatial analogues and we classify most of the remaining ds-proteins (Table S1) into one of three
categories. Examples of these categories are given in the supplementary information. A question remains as to how many of these dsproteins were present in the universal communal state.
The preliminary phylogenetic analysis of the archaeal/eucaryal specific r-proteins show the deep divergence between archaea and
eukarya [data not shown], and similar results have been observed by others for certain ds-proteins [Yang et al., 1999]. The recurrence
of signatures of the canonical pattern alone is not sufficient evidence to claim that a ds-protein was present in the universal communal
state. In addition one must find a remote homologue in order to identify a gene duplication event as shown below for L18e, which
interacts with a rRNA signature.
7
2.4 Evolution of domain specific r-proteins
Protein L18e is archaeal/eukaryal specic, interacts with the helix H34a in H. marismortui [Klein et al., 2004], and is an
archaeal/eukaryal structural signature. It has been assumed that the localization of ds-proteins in a particular domain of life implies that
they are relatively recent innovations. This need not be so. In order to judge whether L18e is a recent innovation or alternatively
present at the universal ancestral state, we used a similar approach to that used to document the evolutionary history of cysteine coding
[O’Donoghue et al., 2005]. This technique requires the determination of remote homologs of L18e in order to ascertain when the gene
duplication event that led to its formation took place.
A sequence-based profile-to-profile database search (see Methods) with the evolutionary profile of L18e
[Sethi et al., 2005] identified the universal protein L15 as its closest homolog. The structural alignment of L15 and L18e, shown in
Figure ??(a), reveals that the globular domains of L15 and L18e are similar both in structure and in sequence (QH= 0.6 and sequence
identity of 20% over the globular domain) confirming that these proteins have a common evolutionary origin. In addition to the
globular domain, L15 has a large tail of more than 60 residues. A number of tertiary interactions between different loops belonging to
domains 1, 2, and 5 of the 23S rRNA are formed close to this tail implying that L15 could be important for the stability of these
tertiary contacts (see Figures S10, S11 and S12) [Klein et al., 2004]. Even though L15 does not appear to interact with 5S rRNA, it has
been shown that 5S rRNA does not bind to 23S rRNA unless L15 is already bound to 23S rRNA [Rohl and Nierhaus, 1982]. In L18e,
this N-terminus tail is replaced by three -helices which interact with the 23S rRNA and one of these helices forms specific contacts
with the archaeal/eukaryal specific signature (helix H34a in H.marismortui ). Hence, the structural elements specific to L15 and L18e
lead to specificity in their binding sites on the LSU.
Due to the low sequence identity between L15 and L18e, the structural alignment between them was used to seed the alignment of
their sequences. Sequence alignment-based methods were used to align all L15 and L18e proteins ( ??c and d) within their respective
groups. The phylogenetic tree shown in Fig. ??b is a map of the evolutionary history of L15 and L18e obtained from the above
alignment. As expected, L15 displays the full canonical phylogenetic pattern. As few contacts are made between the 23S rRNA and the
globular domain of L15, there are few evolutionary constraints on the globular domain of L15, which results in low sequence identity
of L15 proteins representing different domains of life (27% identity on average between archaeal and bacterial L18e in the
evolutionary profile). The fast pace of evolution combined with the small size of the alignment leads to long branch attraction and low
bootstrap values in certain branches. However, the deep divide separating the bacterial and the archaeal/eukaryal version of the
molecule is clearly visible
8
and, in addition, the eukaryal version of the molecule is clearly distinguishable from the archaeal version of the molecule. As all the
L15 sequences form a monophyletic group, the node denoting the root of the L15 tree can clearly be identified between the bacterial
and the archaeal versions of the molecule similar to the canonical pattern in the UPT. In addition, the phylogenetic tree of L15
(including the tail) shows the same major groupings as the phylogenetic tree of L15 based on it’s globular domain indicating that the
small size of the alignment does not lead to spurious groupings.
In spite of long branch attraction, the phylogenetic analysis of L18e indicates that the archaeal/eukaryal divide similar to the one in
the canonical 16S rRNA tree is clearly visible. The alignment of distantly related sequences using structures helps in inferring distant
evolutionary events. L18e diverges from L15 before the node denoted universal communal state on the L15 phylogenetic tree. This
branching along with the recurrence of signatures of the canonical phylogenetic pattern in L18e indicates that L18e was present in the
universal ancestral state.
2.5 Signatures in Operonal Organization
WHAT IS THE POINT OF THIS SECTION? IT WANDERS THROUGH A LITANY OF FACTS, AND SEEMS TO COME TO
NO DEFINITE CONCLUSION. HOW DOES THESE FINDINGS TIE INTO THE REST OF THE PAPER? WHY ISN’T THIS
SECTION A PART OF A SEPARATE PAPER? THE READER IS GOING TO WONDER WHY IT IS INCLUDED UNLESS A
FAR BETTER CASE IS MADE FOR INCLUDING IT. A comparative analysis of the gene context of L18e indicates its gene is
clustered with those of ribosomal proteins L13 and S9, and with the gene for subunit N of the archaeal/eukaryal specic DNAdependent RNA polymerase (the S9 operon [Kromer and Arndt, 1991]) in all the archaeal genomes analyzed. In most archaeal
genomes, the S9 operon occurs in close proximity to the so-called -operon (made up of the r-proteins S13, S4, S11, and the
transcription protein RNA polymerase subunit D (RpoD), a homolog of bacterial RNA polymerase -subunit (RpoA)) (see Figures ??e
and S15). The conservation of the higher order operonal organization of the S9 and the -operons in archaea indicates that this
organization was present in archaea prior to the divide between the euryarchaea and the crenarchaea. In other words, this organization
occurred before the Dawinian transition that led to the archaeal phyla, and therefore, L18e must have been present before this
transition. Horizontal gene transfer in modern organisms takes place in regions close to tRNA genes, also called genomic islands
[Rocap et al., 2003]. The presence of tRNA genes at the 5’ and 3’ ends of the two operons in the Pyrococcales and Halobacteriales
indicates that these genes could be remnants of horizontal gene transfer of the different ribosomal operons as a means of innovation
sharing in the communal state, before the Darwinian transition had occurred.
A similar compartive gene context map of four operons of a super operonal organization (see Support-ing text) indicates that this
arrangement is maintained in representative members of most bacterial phyla indicating that these four operons came together to form
this highly conserved higher order operonal ar-
9
rangement before the Darwinian transition in bacteria. The conservation of the -operon in both bacteria and archaea (albeit in a
slightly different order) also indicates that the -operon may have been formed before the Darwinian transition that led to the split
between the archaeal and bacterial communities. The formation of the -operon before the first Darwinian transition indicates that there
must have been some amount of coupling between translation and transcription even at this stage that needs to be investigated further.
Another interesting aspect of the super operon is that it consists mostly of universal r-proteins and translation factors, while only two
bacteria specific r-proteins are found in it. The conservation of the super operonal organization in bacteria implies that the two bacteria
specific r-proteins (L36 and L17), which are present in this cluster, were present and incorporated into this super operon before the
Darwinian transition in the bacterial domain of life had occured.
3 Discussion
The multidimensional comparative analysis of the ribosomal proteins and RNA together with the operonal organization of r-proteins
provides a deeper understanding of the molecular signatures of translation. The relatively small number of large structural signatures
distinguishing the archaeal and bacterial domains of life indicate that the rate of variation in the RNA was reduced in comparison to
that of the proteins following the bacterial Darwinian transition.
Our coevolution study of the bacterial insertions in the N-terminus region of S4 and the 16S rRNA helix
H16 suggest that these signatures appeared in the bacterial community following the Darwinian transition and were propagated
universally to the other developing bacterial lineages. The divergence of the S4 insertion in to two types could have arisen to fine tune
the function of H16 in response to different environments. Given the importance placed on the functional role of helix H16 in the
above scenario, it is natural to ask what exactly is that functional role? One possibility is that, given its position near the 530 pseudo
loop region of helix H18, it plays some structural role in the decoding process in bacteria, helping to stabilize the region through
contacts with S4. Another possibility is that the bacterial insertions evolved precisely because a signature was needed to protect
against antibiotics that target the decoding region.
The discrepancy between the groupings of the S4 insertion phylogeny and the UPT as shown in Figure
?? suggests that early bacteria shared innovations of less than a full protein domain. However, the short size of this region does not
allow other explanations, such as convergent evolution, to be excluded. If the mechanism for sharing these short innovations existed,
its precise nature remains unknown.
10
A phylogenetic analysis of the 23S rRNA suggest that its evolutionary history is contained both in its structures as well as its
sequences. As anticipated, the groupings in the sequence-based phylogenetic tree of the 23S rRNA are congruent with those in the 16S
rRNA. In addition, the remarkable similarity between the structures of the bacterial 23S rRNA and the archaeal 23S rRNA, in spite of
sharing only 50% of the r-proteins indicates that the core of the ribosome had evolved before the first Darwinian transition. Our study
cannot tell whether the RNA signatures were added or deleted at the Darwinian transition, only that the major structural changes
between the archaea and bacteria must have occurred during the transition, and that the majority of the RNA signatures interact with
domain specific proteins.
All the universal r-proteins display the canonical phylogenetic pattern in which the molecules from the three domains of life are
clearly distinguishable from each other and the root is placed between the bacterial version of the molecule and the archaeal/eukaryotic
versions of the molecule. However, the higher order groupings such as phyla that are classified based on phylogeny of 16S rRNA are
not always observed in r-protein phylogeny indicating that r-proteins continued to be horizontally transferred after the first Darwinian
transition.
Our phylogenetic and comparative gene context analyses of the L18e show that it is possible for domain specfic proteins to be
present at the first Darwinian transition. L18e interacts and stabilizes the helical insertion H34a in the 23s rRNA. In modern
organisms, the ds-protein L18e and the H34a helix are either present together as in archaea and eukarya or both of them are absent as
in the bacterial LSU indicating the coevolutionary nature of L18e and helix H34a. In addition to L18e, our preliminary analysis
indicates that other ds-proteins may have been present at the universal ancestral state [unpublished results]. A comprehensive study of
the evolution of all the remaining ds-proteins needs to be performed to understand the nature of the translational machine at the
universal communal state and at various Darwinian transitions within the individual lineages of each domain of life. Such an extensive
study would require the identification of remote homologs using a combination of sequence and structure-based information.
The higher order operonal organization of the r-proteins has been the most conserved of all functional operons [Wolf et al., 2001].
We have shown here that a comparative analysis of the gene context of these operons can be used to analyze their origin. In addition to
cotranscription of these operons, we propose that this superoperonal arrangement might have evolved for efficient HGT of any new
innovation in the universal communal state. As this was a period when HGT dominated the evolutionary process and no single lineage
for the protocells could be identified, it would make sense for functionally related genes, such as genes of the translational apparatus,
to be present in the same segment of the chromosome or genome
11
[Olsen and Woese, 1996, Lawrence, 1999]. Though a comparative analysis of all the r-protein operons needs to be performed, the fact
that 27 of the 35 universal r-proteins are conserved across the super operon in bacteria, indicates that such a scenario could indeed be
possible.
After translation had evolved into a sophisticated molecular machine, there is evidence that the extent of
HGT had reduced as vertical gene transfer begin to dominate the evolutionary process. The metagenomics studies conducted on many
modern bacterial species including E. coli indicate that HGT is an active mode of innovation sharing in modern cells. A comparison of
the genes in the genomes of the different strains, including pathogenic ones, indicated that only 30% of the genes, including all the
genes in the translational apparatus, were shared between all the strains and has been termed as the core genome. In addition, most of
the genes in the core genome of an organism were found to be very similar in sequence indicating that in spite of millions of years of
evolution, the translational apparatus does not undergo many changes. While these metagenomic studies will drastically influence our
definition of a species [Goldenfeld and Woese, 2007], it is evident that the translational apparatus plays a central role in the modern
cell and a study of the core of the translational machinery will be needed to understand its evolution and the evolution of the cellular
entity as a whole.
12
References
[Cannone et al., 2002] Cannone, J. J., Subramanian, S., Schnare, M. N., Collett, J. R., D’Souza, L. M.,
Du, Y., Feng, B., Lin, N., Madabusi, L. V., Muller, K. M., Pande, N., Shang, Z., Yu, N., and Gutell,
R. R. (2002). The comparative RNA web (CRW) site: an online database of comparative sequence and structure information for
ribosomal, intron, and other RNAs. BMC Bioinformatics, 3:2.
[Carter et al., 2000] Carter, A. P., Clemons, W. M., Brodersen, D. E., Morgan-Warren, R. J., Wimberly,
B. T., and Ramakrishnan, V. (2000). Functional insights from the structure of the 30S ribosomal subunit and its interactions with
antibiotics. Nature, 407(6802):340–348.
[Goldenfeld and Woese, 2007] Goldenfeld, N. and Woese, C. (2007). Biology’s next revolution. Nature,
445(7126):369.
[Gutell and Woese, 1990] Gutell, R. R. and Woese, C. R. (1990). Higher order structural elements in ribosomal RNAs: pseudo-knots and the use of noncanonical pairs. Proc Natl Acad Sci U S A, 87(2):663–667.
[Held et al., 1974] Held, W. A., Ballou, B., Mizushima, S., and Nomura, M. (1974). Assembly mapping of
30 S ribosomal proteins from Escherichia coli. Further studies.J Biol Chem, 249(10):3103–3111.
[Klein et al., 2004] Klein, D. J., Moore, P. B., and Steitz, T. A. (2004). The roles of ribosomal proteins in
the structure assembly, and evolution of the large ribosomal subunit.J Mol Biol, 340(1):141–177.
[Korostelev et al., 2006] Korostelev, A., Trakhanov, S., Laurberg, M., and Noller, H. F. (2006). Crystal
structure of a 70S ribosome-tRNA complex reveals functional interactions and rearrangements. Cell, 126(6):1065–1077.
[Kromer and Arndt, 1991] Kromer, W. J. and Arndt, E. (1991). Halobacterial S9 operon. Three ribosomal
protein genes are cotranscribed with genes encoding a tRNA(Leu), the enolase, and a putative membrane protein in the
archaebacterium Haloarcula (Halobacterium) marismortui.J Biol Chem, 266(36):24573– 24579.
[Lawrence, 1999] Lawrence, J. (1999). Selfish operons: the evolutionary impact of gene clustering in prokaryotes and eukaryotes.Curr Opin Genet Dev, 9(6):642–648.
[Lecompte et al., 2002] Lecompte, O., Ripp, R., Thierry, J.-C., Moras, D., and Poch, O. (2002). Comparative
analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale. Nucleic Acids Res,
30(24):5382–5390.
[Ludwig and Schleifer, 1994] Ludwig, W. and Schleifer, K. H. (1994). Bacterial phylogeny based on 16S and
23S rRNA sequence analysis. FEMS Microbiol Rev, 15(2-3):155–173.
[Mears et al., 2002] Mears, J. A., Cannone, J. J., Stagg, S. M., Gutell, R. R., Agrawal, R. K., and Harvey, S. C. (2002). Modeling a minimal ribosome based on comparative sequence analysis.J Mol Biol, 321(2):215–234.
[Mizushima and Nomura, 1970] Mizushima, S. and Nomura, M. (1970). Assembly mapping of 30S ribosomal
proteins from E. coli. Nature, 226(5252):1214.
[Mushegian, 2005] Mushegian, A. (2005). Protein content of minimal and ancestral ribosome. RNA,
11(9):1400–1406.
[Nowotny and Nierhaus, 1988] Nowotny, V. and Nierhaus, K. H. (1988). Assembly of the 30S subunit from
Escherichia coli ribosomes occurs via two assembly domains which are initiated by S4 and S7. Biochemistry, 27(18):7051–7055.
13
[O’Donoghue and Luthey-Schulten, 2003] O’Donoghue, P. and Luthey-Schulten, Z. (2003). On the evolution
of structure in the aminocyl-tRNA synthetases. Microbiol. Mol. Bio. Rev., 67:550–573.
[O’Donoghue and Luthey-Schulten, 2005] O’Donoghue, P. and Luthey-Schulten, Z. (2005). Evolutionary
profiles derived from the QR factorization of multiple structural alignments gives an economy of informa-tion.J Mol Biol,
346(3):875–894.
[O’Donoghue et al., 2005] O’Donoghue, P., Sethi, A., Woese, C. R., and Luthey-Schulten, Z. A. (2005). The
evolutionary history of Cys-tRNACysformation. Proc Natl Acad Sci U S A, 102(52):19003–19008.
[Olsen and Woese, 1996] Olsen, G. J. and Woese, C. R. (1996). Lessons from an Archaeal genome: what
are we learning from Methanococcus jannaschii?Trends Genet, 12(10):377–379.
[Roberts et al., 2006] Roberts, E., Eargle, J., Wright, D., and Luthey-Schulten, Z. (2006). MultiSeq: Unifying sequence and structure data for evolutionary analysis. BMC Bioinformatics, 7:382.
[Rocap et al., 2003] Rocap, G., Larimer, F. W., Lamerdin, J., Malfatti, S., Chain, P., Ahlgren, N. A.,
Arellano, A., Coleman, M., Hauser, L., Hess, W. R., Johnson, Z. I., Land, M., Lindell, D., Post, A. F.,
Regala, W., Shah, M., Shaw, S. L., Steglich, C., Sullivan, M. B., Ting, C. S., Tolonen, A., Webb, E. A.,
Zinser, E. R., and Chisholm, S. W. (2003). Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche
differentiation. Nature, 424(6952):1042–1047.
[Rohl and Nierhaus, 1982] Rohl, R. and Nierhaus, K. H. (1982). Assembly map of the large subunit (50S)
of Escherichia coli ribosomes. Proc Natl Acad Sci U S A, 79(3):729–733.
[Schuwirth et al., 2005] Schuwirth, B. S., Borovinskaya, M. A., Hau, C. W., Zhang, W., Vila-Sanjurjo, A.,
Holton, J. M., and Cate, J. H. D. (2005). Structures of the bacterial ribosome at 3.5 A resolution. Science, 310(5749):827–834.
[Selmer et al., 2006] Selmer, M., Dunham, C. M., Murphy, F. V. t., Weixlbaumer, A., Petry, S., Kelley,
A. C., Weir, J. R., and Ramakrishnan, V. (2006). Structure of the 70S ribosome complexed with mRNA and tRNA. Science,
313(5795):1935–1942.
[Sethi et al., 2005] Sethi, A., O’Donoghue, P., and Luthey-Schulten, Z. (2005). Evolutionary profiles from
the QR factorization of multiple sequence alignments. Proc Natl Acad Sci U S A, 102(11):4045–4050.
[Talkington et al., 2005]Talkington, M. W. T., Siuzdak, G., and Williamson, J. R. (2005). An assembly
landscape for the 30S ribosomal subunit. Nature, 438(7068):628–632.
[Trust et al., 1994]Trust, T. J., Logan, S. M., Gustafson, C. E., Romaniuk, P. J., Kim, N. W., Chan, V. L.,
Ragan, M. A., Guerry, P., and Gutell, R. R. (1994). Phylogenetic and molecular characterization of a 23S rRNA gene positions the
genus Campylobacter in the epsilon subdivision of the Proteobacteria and shows that the presence of transcribed spacers is common
in Campylobacter spp.J Bacteriol, 176(15):4597–4609.
[Vetsigian et al., 2006]Vetsigian, K., Woese, C., and Goldenfeld, N. (2006). Collective evolution and the
genetic code. Proc Natl Acad Sci U S A, 103(28):10696–10701.
[Winker and Woese, 1991] Winker, S. and Woese, C. R. (1991). A definition of the domains Archaea, Bacteria and Eucarya in terms of small subunit ribosomal RNA characteristics. Syst Appl Microbiol, 14(4):305– 310.
[Woese, 1987]Woese, C. R. (1987). Bacterial evolution. Microbiol Rev, 51(2):221–271. [Woese, 2002]Woese, C. R. (2002). On the
evolution of cells. Proc Natl Acad Sci U S A, 99(13):8742–8747.
[Woese et al., 1990]Woese, C. R., Kandler, O., and Wheelis, M. L. (1990). Towards a natural system
of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A, 87(12):4576–4579.
14
[Woese et al., 2000]Woese, C. R., Olsen, G. J., Ibba, M., and Soll, D. (2000). Aminoacyl-tRNA synthetases,
the genetic code, and the evolutionary process. Microbiol Mol Biol Rev, 64(1):202–236.
[Wolf et al., 2001]Wolf, Y. I., Rogozin, I. B., Kondrashov, A. S., and Koonin, E. V. (2001). Genome
alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. Genome Res,
11(3):356–372.
[Yang et al., 1999]Yang, D., Kusser, I., Kopke, A. K., Koop, B. F., and Matheson, A. T. (1999). The structure and evolution of the ribosomal proteins encoded in the spc operon of the archaeon (Crenarchaeota) Sulfolobus acidocaldarius.
Mol Phylogenet Evol, 12(2):177–185. Comparative Study.
15
Figure Captions
Figure 1. Illustration of ribosomal protein S4 bound to the 16S rRNA in T. thermophilus. (left) The conserved core of S4 (orange)
interacts with conserved rRNA (white) while the bacterial insertion in S4 (yellow) interacts with a bacterial signature region in helix
H16 of the 16S (green). Red spheres show the location of mRNA bound to the ribosome. (right) A secondary structure diagram of the
16S rRNA near the S4 binding region inT. thermophilus. Indicated on the diagrams are the bacterial (green) 16S signature region of
H16, the bases of the 16S that make contact with the S4 bacterial insertion (yellow), and the bases that make contact with the S4
conserved core (orange). Residue numbers are given in both T. thermophilus and E. colinumbering (in parenthesis).
Figure 2. Phylogenetic properties of the N-terminal region of S4 in bacteria. On the left is a simplified
Bayesian phylogenetic tree of the insertion (see Methods); branch numbers show the posterior probabilities for each major group. The
type of each major group is shown on the right along with the bacterial classes that contain insertions of that type. Also shown is a
representative N-terminal sequence for the given type. Residues are colored according to the following: (blue) residues that are
conserved 95% across all bacteria, (yellow) residues that are conserved 95% across their subtype.
Figure 3. Evolution of 23S rRNA. (a) Evolution of structure of 23S rRNA measured by a structure of homologyQ H. (b) Sequence tree
of the 23S rRNA showing separation of archaea and bacteria as in agreement with structure tree. Unrooted tree made using Unrooted.
Archaeal organisms are colored in blue text while bacterial organisms are colored in red.
Figure 4. Overlap of representative domain specific ribosomal proteins from a 23S rRNA structural alignment of T. thermophilus and
H. marismortui. Bacterial proteins are shown in blue and archaeal proteins in orange. The A-site tRNA (green), P-site tRNA (red), and
mRNA (yellow) are shown for orientation.
Figure 5. Evolution of L18e and L15P: (a) Overlap of L15 and L18e colored by a measure of structural conservation,Q res. (b) The
phylogenetic tree of the homolgous proteins L15P and L18e: archaea in blue, eucarya in green, and bacteria in red. The full name of
each organism is given in the Supplementary Information. (c) L18e colored by sequence conservation measured in an evolutionary
profile. L18e interacts with the A/E-specific rRNA insertion shown. The bases that interact with L18E are colored with a green
background and conserved bases are explicitly shown on the secondary structure with H. marismortuii numbering. (d) L15P colored by
sequence conservation measured in an evolutionary profile. (e) Genomic context of L18e in the crenarchaea S. solfataricus and an
euryarchaea genome Halobacterium sp.
16