Download Consensus temporal order of amino acids and evolution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene wikipedia , lookup

Metalloprotein wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Citric acid cycle wikipedia , lookup

Fatty acid synthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Protein wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Fatty acid metabolism wikipedia , lookup

Peptide synthesis wikipedia , lookup

Proteolysis wikipedia , lookup

Protein structure prediction wikipedia , lookup

Metabolism wikipedia , lookup

Calciseptine wikipedia , lookup

Amino acid wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Biosynthesis wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
Gene 261 (2000) 139±151
www.elsevier.com/locate/gene
Consensus temporal order of amino acids and evolution of the triplet code
E.N. Trifonov*
Department of Structural Biology, The Weizmann Institute of Science, Rehovot 76100, Israel
Received 30 May 2000; received in revised form 28 July 2000; accepted 25 September 2000
Received by G. Bernardi
Abstract
Forty different single-factor criteria and multi-factor hypotheses about chronological order of appearance of amino acids in the early
evolution are summarized in consensus ranking. All available knowledge and thoughts about origin and evolution of the genetic code are thus
combined in a single list where the amino acids are ranked chronologically. Due to consensus nature of the chronology it has several
important properties not visible in individual rankings by any of the initial criteria. Nine amino acids of the Miller's imitation of primordial
environment are all ranked as topmost (G, A, V, D, E, P, S, L, T). This result does not change even after several criteria related to Miller's
data are excluded from calculations. The consensus order of appearance of the 20 amino acids on the evolutionary scene also reveals a unique
and strikingly simple chronological organization of 64 codons, that could not be ®gured out from individual criteria: New codons appear in
descending order of their thermostability, as complementary pairs, with the complements recruited sequentially from the codon repertoires of
the earlier or simultaneously appearing amino acids. These three rules (Thermostability, Complementarity and Processivity) hold strictly as
well as leading position of the earliest amino acids according to Miller. The consensus chronology of amino acids, G/A, V/D, P, S, E/L, T, R,
N, K, Q, I, C, H, F, M, Y, W, and the derived temporal order for codons may serve, thus, as a justi®ed working model of choice for further
studies on the origin and evolution of the genetic code. q 2000 Elsevier Science B.V. All rights reserved.
Keywords: Abiotic; Chronology of amino acids; Chronology of codons; Origin of life; Origin of code
1. Introduction
Three years ago Thomas Bettecken and myself (Trifonov
and Bettecken, 1997) developed an idea that, perhaps, some
repeating sequences frequently expanding in disease, in
particular, GCT triplets, may have enjoyed their expandability as an advantage in the very early days of evolution of
nucleic acids. The very ®rst codons, therefore, could have
been the GCU triplet and its point change derivatives. The
list of the earliest amino acids could be reconstructed from
the yields of amino acids in the imitation experiments of
Miller (1953), giving priority to chemically simplest ones,
and those which are served today by more ancient tRNA
synthetases of class II. Spectacularly, resulting six amino
acids, ala, asp, gly, pro, ser and thr, are, indeed, encoded
today by the GCU derivative triplets. The success of the
fusion of these two independent `predictions', on the earliest
codons and the earliest amino acids (Trifonov and
Abbreviations: A, C, D, standard abbreviations for amino acids Ala, Cys,
Asp; R, Y, N or X, standard abbreviations for purines, pyrimidines and any
base, respectively
* Tel./Fax: 1972-8-934-2653.
E-mail address: [email protected] (E.N. Trifonov).
Bettecken, 1997), suggests that, perhaps, one would be
also able to make a more detailed reconstruction of the
chronology of amino acids and of respective codons. For
that purpose one could recruit many more criteria of the
evolutionary age of amino acids, along with three criteria
mentioned above. In this paper 40 different estimates are
analyzed both collected from literature and suggested anew.
Results of the reconstruction, unexpectedly informative, are
presented below.
2. Results
2.1. Criteria
The list of collected criteria is shown in Table 1, in no
special order. A brief comment and reference to each of the
criteria follow. The numbers for the criteria below correspond to the listing in the table.
2.1.1. Various multi-factor theories
0378-1119/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved.
PII: S 0378-111 9(00)00476-5
N9. According to Jukes (1973) initially only ten
amino acids were present in the code, each one invol-
140
E.N. Trifonov / Gene 261 (2000) 139±151
Table 1
Forty criteria for amino-acid chronology
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
Simplicity (number of non-hydrogen atoms)
Involvement with more ancient synthetases of class II
Yield in the Miller's experiments
Amino-acid composition of extant proteins
Chemical inertness
Stability of codon-anticodon interactions
Molecular clock sequence analysis of synthetases
Stability of (`older') assignments in the table of the code
Jukes' theory of the origin of the code
Co-evolution theory of Wong
GCU-based theory of Trifonov and Bettecken
RRY hypothesis of Crick
RNY hypothesis, Eigen and Schuster
Hypothesis of Hartman
Hypothesis of Ferreira
Prebiotic physicochemical code of Altshtein-E®mov
Early copolymerization code of Nelsestuen
Composition of proteinoids of Fox
Co-evolution theory of Dillon
Yield in imitation experiments of Fox and Windsor
Yield in experiments of Harada and Fox, high temperatures.
Yield in shock wave experiments of Bar-Nun
Co-evolution theory of WaÈchtershaÈuser
Remnants of primordial code in tRNA (MoÈller and Janssen)
Evolutionary distances between isoacceptor tRNAs
Hypothesis of O. Ivanov
Match scores of BLOSUM matrix
A/U start, Jimenez-Sanchez
N-®xing amino acids ®rst, Davis
GNN codons ®rst, Taylor and Coates
Algebraic model of Hornos and Hornos
Composition of translated Urgen
Murchison meteorite
Minimal graph complexity, amino acids
Minimal graph complexity, amino-acid residues
Hypothesis of Jimenez-Montano
`Size/complexity' score, Dufton
Minimal alphabet for folding
DNA stability
RNA duplex stability
ving four to eight codons. Later the excessive codons
had been reassigned to additional amino acids.
N10. Co-evolution theory of Wong (1981, 1988) is
based on the observation that biosynthetically related
similar amino acids share similar codons rather than
similar physicochemical properties. The amino acids
of the Miller's mix are considered the earliest.
N11. GCU-based theory (Trifonov and Bettecken,
1997).
N14. According to Hartman (1975a, 1975b, 1978,
1995) the code started from a singlet code and eventually developed to a doublet and triplet code, starting
with G and C.
N15. This speculation is based on gradual transition of
enzymatic functions from RNA-like oligomers to
randomly synthesized peptides (Ferreira and
Coutinho, 1993).
N16. The code could start as selective interactions of
amino acids and nucleotides in progenes ± mixed
anhydrides of amino acids and trinucleotides (Altshtein and E®mov, 1988).
N17. This is based on the hypothesis on copolymerization of amino acids and bases, and hydrogen bonding between respective aminoacyl nucleotides
(Nelsestuen, 1978).
N19. Biochemical co-evolution model based on metabolism of simple sulfur bacteria in the genus Thioploca (Dillon, 1978).
N23. Possible early synthesis of amino acids in sulfuriron surface systems in volcanic vents (WaÈchtershaÈuser, 1988).
N28. Analysis of the metabolism of pyrimidine
biosynthesis leads to an idea that the code began in
RNA world with the letters A and U (JimenezSanchez, 1995).
N29. This suggested chronology is based on the N®xing amino acids (D, E, N, Q) as an initial set. These
amino acids are thought to have been ambiguously
translated from polyA in mineral surface reactions.
Further additions to the code are assumed to follow
path lengths of amino acid biosynthesis, starting with
reductive citrate cycle (Davis, 1999).
N30. The order is based on the key positions in the
synthetic pathways of the amino acids, on the
presence in the Miller's mixture, and on the preference to amino acids encoded by GNN triplets (Taylor
and Coates, 1989).
N12. RRY theory of Crick et al. (1976).
N31. Order suggested by group-theoretical analysis of
the symmetries in the genetic code (Hornos and
Hornos, 1993).
N13. RNY theory of Eigen, Schuster and WinklerOswatitsch (Eigen and Schuster, 1978; Eigen and
Winkler-Oswatitsch, 1981a,b; Eigen et al., 1981).
N36. Model based on the group theory, thermodynamics of codon-anticodon interactions and on
complexity of amino acids (Jimenez-Montano, 1999).
E.N. Trifonov / Gene 261 (2000) 139±151
2.1.2. Yields in experiments imitating primordial conditions
It is assumed that in the primordial conditions those abiotic amino acids which had higher concentrations in the
environment were also ®rst to be incorporated in the early
code. The very ®rst experiment of this kind indicated detectable amounts of glycine (LoÈb, 1913; see also Yockey et al.,
1997).
N3. Experiment of Miller with electric discharge in
imitated conditions of early Earth atmosphere (Miller,
1953; Weber and Miller, 1981).
N18. Composition of proteinoids in the experiments
by Fox and Waehneldt (1968).
N20. Experiments of Fox and Windsor (1970) at
moderate temperatures.
N21. Experiments of Harada and Fox (1964) at high
temperatures.
N22. Shock wave experiments (Bar-Nun et al., 1970).
2.1.3. Criteria based on complexity of the amino acids
One would expect that more complex amino acids would
appear later in the evolution of the code. There are several
rather different algorithms to evaluate complexity of amino
acids and their side residues.
N1. Complexity, estimated by simple counts of nonhydrogen atoms in the side residues (Trifonov and
Bettecken, 1997).
N34. Minimal graph complexity of amino acids and
N35. Minimal graph complexity of amino-acid residues (Papentin, 1982). In these criteria traditional
chemical presentations of the amino acids are transformed in linear graphs, and complexity of the graphs
then calculated.
N37. `Size/complexity' score (Dufton, 1997) is calculated that takes into account both complexity of
chemical structure and bulk physico-chemical characteristics of the residues (size, weight, charge, etc.).
2.1.4. Criteria based on amino-acid composition of various
ensembles of proteins
N4. Present-day proteins. Their composition is taken
from (Arques and Michel, 1996), for prokaryotes and
eukaryotes, and rank averaged. One may expect that
the latest amino acids would be largely underrepre-
141
sented in the composition of modern proteins. Thus
the ranking by composition would also re¯ect aminoacid chronology.
N26. Functionally most ancient proteins may contain
more of earliest amino acids (Ivanov, 1989).
N32. The composition of the protein encoded by the
earliest mRNA (master tRNA or Urgen of Eigen and
Winkler-Oswatitsch, 1981a,b) should be dominated
by the earliest amino acids.
2.1.5. Criteria based on thermostability of early nucleic
acids
N6. Codon-anticodon interactions. More stable interactions would be more favorable in the early, presumably higher temperature conditions, and in early
simple complexes with no additional stabilizing interactions. The melting enthalpies of the dinucleotide
stacks (Xia et al., 1998) corresponding to the ®rst
and second codon positions are taken as measure of
their thermostability.
N39. Stability of early DNA as carrier of genetic
memory may be of importance. The amino acids
with higher stability of the respective complementary
pairs of triplets in DNA would be expected then to be
earlier ones. The stability of the triplet pairs is calculated as sum of respective melting enthalpies for the
dinucleotide stack components (Gotoh, 1983).
N40. Stability of the early duplex RNA genes.
Respective enthalpies for the triplet pairs are calculated as above, by using values for RNA dinucleotides
(Xia et al., 1998). See Table 2.
2.1.6. Criteria that involve amino-acyl-tRNA synthetases
N2. Involvement with more ancient synthetases of
class II (Eriani et al., 1990). Since the class II synthetases are believed to be more ancient, than class I
enzymes (Hartman, 1995), respective ten amino
acids served by the class II enzymes would be
expected to have been available earlier than others.
N7. Two amino-acyl-tRNA synthetases, for tyr and
trp, are found to be sequence-wise youngest of all,
keeping inter-species similarities higher than intraspecies similarities between the synthetases (Ribas
de Pouplana et al., 1996).
142
E.N. Trifonov / Gene 261 (2000) 139±151
prebiotic conditions. The extraterrestrial origin of life
may be assumed (Kvenvolden et al., 1971).
Table 2
Thermostability of the codons (complementary pairs, kcal/M)
A
C
D
E
F
G
H
I
GCC
GCG
GCU
GCA
UGC
UGU
GAC
GAU
GAG
GAA
UUC
UUU
GGC
GGG
GGA
GGU
CAC
CAU
AUC
AUA
AUU
28.3
25.5
25.4
25.3
25.3
21.8
23.8
21.8
22.9
19.3
19.3
13.6
28.3
26.8
25.8
24.8
21.8
19.8
21.8
17.1
16.3
K
L
L
M
N
P
Q
R
AAG
AAA
CUC
CUG
CUA
CUU
UUG
UUA
AUG
AAC
AAU
CCC
CCG
CCU
CCA
CAG
CAA
CGC
CGG
CGA
CGU
17.3
13.6
22.9
20.9
18.2
17.3
17.3
14.5
19.8
18.2
16.3
26.8
24.0
23.9
23.8
20.9
17.3
25.5
24.0
23.1
22.0
R
S
S
T
V
W
Y
AGG
AGA
UCC
UCG
UCU
UCA
AGC
AGU
ACC
ACG
ACU
ACA
GUC
GUG
GUA
GUU
UGG
UAC
UAU
23.9
22.9
25.8
23.1
22.9
22.9
25.4
21.9
24.8
22.0
21.9
21.8
23.8
21.8
19.1
18.2
23.8
19.1
17.1
2.1.7. Other single-factor criteria
N38. Minimal and, presumably, early alphabet suf®cient for rapid folding of small b-sheet protein (Riddle
et al., 1997).
2.2. Consensus chronology of amino acids
2.2.1. Averaging raw data
In Fig. 1 the rankings of the amino acids by their order of
appearance in evolution of the triplet code are indicated for
all 40 above criteria of their chronology. Many of them do
not provide detailed ranking in which case the amino acids
are grouped under the same rank for all group members. For
example, by the criterion 8 the amino acids A, D, E, F, G, H
and P are all equally earliest (seven amino acids of the same
average rank 4), then follows V (rank 8), I (rank 9), K, N, S,
Y (four amino acids of the same rank 11.5), and so on. For
every amino acid the corresponding 40 rank values are averaged. The lowest average rank value would correspond to
the amino acid that is most frequently on the left ± the
earliest one. In Table 3 in the column `Raw data' the aver-
N5. Chemical inertness would be important in the
early stages of life when the amino acids were not
yet protected by sophisticated cellular homeostases.
The amino acids are divided in three groups ± inert,
moderately reactive and highly reactive (Trifonov,
1999).
N8. Many of assignments in the table of the triplet
code are unstable, that is in some species a given
codon may serve a different amino acid. Such instabilities may indicate which of the amino acids are
acquired more recently. The ranking of the instabilities is evaluated from data in (Osawa et al., 1992).
N24. Remnants of primitive code at positions 3 to 5 in
tRNA sequences (MoÈller and Janssen, 1990).
N25. The order may be estimated from evolutionary
distances between isoacceptor tRNAs (Chaley et al.,
1999).
N27. Currently used matrices for amino acid substitution, e.g. BLOSUM62 (Henikoff and Henikoff, 1992)
may re¯ect not only similarity of properties of the
interchangeable amino acids, but the time of appearance of the amino acid as well. In particular, C, H and
W may have high diagonal values in the matrices
rather because these are young amino acids that had
less time for the replacements.
N33. Amino acids delivered by meteorites may re¯ect
Fig. 1. Amino-acid chronology ranking by individual criteria. Criteria N1*,
N3*, N6* and N11* are combined criteria (see Section 2.2.2.1). Respective
excluded criteria are enclosed in parentheses. When the amino acids are
grouped together under the same rank, dashes are placed in unoccupied
rank positions, to assist reading the ranks from the ®gure.
E.N. Trifonov / Gene 261 (2000) 139±151
Table 3
Consensus chronology of amino acids
Raw data
G
A
V
D
S
E
P
L
T
I
N
R
K
Q
C
F
H
M
Y
W
4.4 ^ 0.7
4.9 ^ 0.8
6.9 ^ 0.6
7.2 ^ 0.7
7.9 ^ 0.7
8.2 ^ 0.7
8.3 ^ 0.7
9.4 ^ 0.7
10.1 ^ 0.6
11.2 ^ 0.7
11.8 ^ 0.7
12.0 ^ 0.7
12.0 ^ 0.7
12.4 ^ 0.7
12.4 ^ 0.7
13.0 ^ 0.7
13.3 ^ 0.6
14.0 ^ 0.6
14.7 ^ 0.5
15.8 ^ 0.6
Filtered data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
G
A
V
D
E
P
S
L
T
R
N
K
Q
I
C
H
F
M
Y
W
2.9 ^ 0.3
2.9 ^ 0.3
6.6 ^ 0.6
7.0 ^ 0.7
7.2 ^ 0.6
7.5 ^ 0.6
7.7 ^ 0.7
9.5 ^ 0.7
9.8 ^ 0.6
11.5 ^ 0.7
12.2 ^ 0.7
12.3 ^ 0.5
13.0 ^ 0.4
13.0 ^ 0.5
14.3 ^ 0.6
14.9 ^ 0.5
15.1 ^ 0.4
15.4 ^ 0.4
15.6 ^ 0.4
16.7 ^ 0.5
Miller
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
G
A
V
D
E
P
S
L
T
A
G
V
D
E
P
S
L
T
I
I
age ranking values are shown together with calculated errors
for the averages. The amino acids are sorted here according
to the rank averages. According to the raw data analysis gly
and ala are the ®rst amino acids to appear followed by val
and asp. Tyr and trp appear last. Although the raw data
result needs some re®nement (®ltering), most of the features
of the raw chronology stay unchanged after the treatment.
2.2.2. Filtering
Two steps of ®ltering are applied: combining of related
criteria, and elimination of the noise background.
2.2.2.1. Related criteria. Some criteria are related by the
concept they are based on, but are suf®ciently different.
While other criteria may be unrelated by the underlying
idea, and yet the respective rankings are very similar. A
quantitative estimate of the relatedness of various criteria
is given by correlation coef®cients for pair-wise
comparisons between the rankings. The Spearman
correlation coef®cients are calculated, based on
normalized total mis®ts between compared rankings.
Table 4 displays the 39 £ 40 triangular matrix for all pairwise correlations between the 40 rankings. Positive
correlations signi®cantly exceed the negative ones which
is an encouraging result indicating that large proportion of
the criteria are relevant, indeed, and the respective rankings,
apparently, convey some common truth. One could discard
those criteria that show overall negative correlation with
others. They may, however, give correct order if not ranks
for some amino acids. Besides, the non-orthodox views are
especially valuable for the analysis, since other, `positive'
143
criteria may be biased in favor of dominating speculations
or results. This is why all criteria are taken into the
calculation. On the other hand, there could be some
criteria that correlate strongly due to their relatedness a
priori, being based on similar ideas. That is, essentially,
they offer largely the same rankings. Such data should be
combined (averaged), in order to avoid the obvious bias. Of
22 strongly correlated (correlation coef®cient .0.5) pairs 16
pairs are found to be apparently independent. For example,
rankings N24 and N30 (based on tRNA sequences, and on
the domination of GNN codons, respectively) show highest
correlation (0.81). Six pairs, on the other hand, are clearly
related. They make four groups: N1 and N35 ± these are
criteria of complexity of amino acids; N3, N10, N20 and
N21 ± all based on or closely related to the Miller's
imitation experiments; N6 and N40 ± criteria of
thermostability; and N11 and N13 ± related theories based
on GCU and RNY triplets, respectively. These criteria are,
thus, combined in four averaged rankings (N1*, N3*, N6*
and N11* in the Fig. 1). This reduction, from 40 to 34
rankings, makes the ®rst step of ®ltering of the raw data.
2.2.2.2. Reduction of the noise level. The rank estimates for
each of 20 amino acids provided by the remaining 34
rankings, if all irrelevant, would be scattered randomly
over a whole range from rank 1 to rank 20, with the
density 1.74 per rank unit (note that there are fractional
ranks, but not smaller than 1.0). In reality the suggested
estimates for individual amino acids rather form clusters
(data not shown). For example, 20 of 34 rank estimates
for alanine concentrate between the ranks 1 and 3. Most
of remaining estimates, far away from the average, can be
considered as noise. To eliminate this noise component the
low density (,1.0 per rank unit) ¯anks of rank distributions
for individual amino acids are discarded. New, corrected
averages are calculated. These are presented in the
column `Filtered data' of the Table 3.
2.2.2.3. Robustness of the derived chronology of amino
acids. In Table 5 the raw data chronology, ®ltered
chronology (two steps separately) as well as earlier
estimates, on smaller sets of the criteria, are presented.
The boldface amino acids correspond to those with
estimated ranks identical or one rank off the ®nal (last
column) chronology. Already by only three criteria
(Trifonov and Bettecken, 1997) ten ranks of 20 are
practically the same as by extensive 40 criteria analysis
with two-step ®ltering. The earlier chronologies become
closer to the ®nal one with the increase in the number of
criteria through 7 (Trifonov, 1999a), 25 (Trifonov, 1999b),
28 (Trifonov, 2000) and 40 criteria (this work). Remarkably,
the last column shows almost the same order as in the raw
data for the 40 criteria, with only three amino acids having
ranks more than one unit off (arg, ile, and ser). Data for one
and two steps of ®ltering are within accuracy of one rank
unit almost identical. These comparisons, thus, demonstrate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
1
0.09
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
2
0.30
0.05
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
3
0.24
2 0.02
0.46
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
4
0.14
0.09
0.21
0.16
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
5
0.17
0.06
0.27
0.18
2 0.05
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
6
0.22
0.04
0.07
0.12
0.09
0
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
7
2 0.17
0.14
0.12
2 0.06
0.06
2 0.06
2 0.06
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
8
2 0.08
0
2 0.04
0.11
2 0.07
0.14
0.04
2 0.10
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
9
Table 4
Correlation coef®cients for all pairwise comparisons of the rankings suggested by 40 criteria
0.29
0.04
0.47
0.42
0.07
0.33
0.07
0.13
0.17
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
10
0.31
0.30
0.43
0.28
0.14
0.51
0
0.18
0.25
0.46
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
11
0.04
0.16
0.04
2 0.03
2 0.14
0.10
0.16
2 0.12
2 0.04
0.13
0.20
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
12
0.35
0.36
0.32
0.28
0.21
0.18
0
0.01
0.10
0.18
0.58
0.34
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
13
2 0.04
0.38
0
0.05
2 0.04
0.21
0.16
0
0.16
2 0.12
0.18
0
0.04
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
14
2 0.29
0.04
2 0.33
2 0.19
0.05
2 0.14
0.16
2 0.21
2 0.18
2 0.31
2 0.22
0.12
2 0.28
2 0.02
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
15
0
0.18
0.34
0.33
2 0.16
0.32
0.01
0
0.19
0.25
0.56
0.10
0.21
0.21
2 0.26
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
16
2 0.33
2 0.08
0.09
2 0.08
2 0.26
2 0.35
0.02
2 0.08
2 0.16
2 0.07
2 0.15
2 0.20
2 0.12
2 0.44
2 0.36
2 0.02
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
17
2 0.09
2 0.39
2 0.01
2 0.04
2 0.16
2 0.33
2 0.04
2 0.26
2 0.29
2 0.11
2 0.11
2 0.24
2 0.17
2 0.40
2 0.38
2 0.07
2 0.20
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
18
0.10
2 0.03
0.31
0.22
0.03
0.30
0.18
0.09
0.29
0.18
0.21
2 0.02
0.07
0.25
2 0.22
0.25
2 0.39
2 0.09
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
19
0.15
0.20
0.40
0.23
0.34
0.29
0.03
0.28
2 0.08
0.41
0.38
0.06
0.28
0.03
2 0.16
0.14
2 0.06
2 0.21
0.09
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
20
144
E.N. Trifonov / Gene 261 (2000) 139±151
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
0.18
0.09
0.67
0.36
0.28
0.30
2 0.03
0.29
0.10
0.64
0.36
0.02
0.26
2 0.02
2 0.27
0.17
0.05
2 0.06
0.27
0.51
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
21
0.18
0.03
0.30
0.27
0.25
0.16
0.16
0
2 0.04
0.13
0.25
0.40
0.29
2 0.03
0.09
0.10
2 0.20
0.23
0.25
0.13
0.27
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
22
Table 4 (continued)
0.28
0.02
0.13
2 0.06
2 0.08
0.17
0.12
2 0.14
0.10
0.20
0.21
0.16
0.10
0.04
2 0.30
0.03
2 0.40
2 0.18
0.13
0.02
0.14
2 0.16
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
23
0.14
0.12
0.32
0.14
0.08
0.28
0.16
0.17
2 0.04
0.26
0.40
0.60
0.40
0.04
2 0.06
0.25
0.20
0.11
0.32
0.14
0.25
0.79
0.04
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
24
2 0.26
2 0.38
2 0.36
2 0.33
2 0.27
0.16
2 0.12
2 0.39
2 0.05
2 0.32
2 0.15
2 0.13
2 0.43
2 0.14
2 0.07
2 0.32
2 0.54
2 0.57
2 0.15
2 0.16
2 0.26
2 0.10
2 0.18
2 0.24
±
±
±
±
±
±
±
±
±
±
±
±
±
±
±
25
0.17
0.12
0.44
0.38
0.15
0.52
0.04
2 0.01
0.48
0.40
0.59
0.02
0.28
0.20
2 0.16
0.62
2 0.08
2 0.23
0.29
0.40
0.46
0.16
0.18
0.16
0.01
±
±
±
±
±
±
±
±
±
±
±
±
±
±
26
2 0.02
2 0.38
0.17
0.45
0.10
2 0.19
0.17
2 0.52
2 0.15
0
2 0.04
2 0.26
2 0.03
2 0.15
2 0.45
0.08
0.01
0.01
2 0.01
2 0.05
0
0.12
2 0.25
2 0.05
2 0.46
0.05
±
±
±
±
±
±
±
±
±
±
±
±
±
27
2 0.63
2 0.43
2 0.57
2 0.49
2 0.25
2 0.74
2 0.11
2 0.52
2 0.57
2 0.68
2 0.67
2 0.02
2 0.35
2 0.67
0.13
2 0.54
0.13
2 0.22
2 0.72
2 0.46
2 0.57
2 0.02
2 0.67
2 0.09
2 0.54
2 0.65
2 0.22
±
±
±
±
±
±
±
±
±
±
±
±
28
0.16
2 0.14
0.13
2 0.05
2 0.09
0.10
0.11
2 0.17
0.14
0.22
0.18
0.15
0.08
2 0.07
2 0.29
0
2 0.43
2 0.26
0.24
2 0.01
0.12
2 0.13
0.64
0.07
2 0.20
0.14
2 0.15
2 0.65
±
±
±
±
±
±
±
±
±
±
±
29
0.04
0.03
0.36
0.20
0
0.29
0.09
0.26
0.01
0.38
0.45
0.42
0.34
0.01
2 0.16
0.36
2 0.17
0.21
0.38
0.21
0.35
0.62
0.13
0.81
2 0.30
0.25
2 0.05
2 0.33
0.19
±
±
±
±
±
±
±
±
±
±
30
2 0.36
2 0.31
2 0.18
2 0.23
2 0.42
2 0.21
2 0.07
2 0.22
2 0.22
2 0.04
2 0.13
2 0.06
2 0.38
2 0.39
2 0.16
2 0.02
0.28
2 0.30
2 0.27
2 0.16
2 0.14
2 0.16
2 0.42
2 0.06
2 0.22
2 0.03
2 0.24
2 0.23
2 0.37
2 0.06
±
±
±
±
±
±
±
±
±
31
0.21
0.06
0.17
0.21
0.38
0.36
0.06
2 0.13
0.26
0.12
0.34
2 0.07
0.18
0.37
2 0.05
0.23
2 0.47
2 0.10
0.32
0.15
0.17
0.19
0.14
0.15
2 0.15
0.47
0.03
2 0.65
0.10
0.19
2 0.37
±
±
±
±
±
±
±
±
32
0.16
0.16
0.39
0.17
0.09
0.42
0.04
0.34
0.08
0.36
0.63
0.28
0.36
0.14
2 0.10
0.32
2 0.30
0.10
0.42
0.37
0.43
0.51
0.12
0.64
2 0.20
0.36
2 0.16
2 0.47
0.18
0.77
2 0.17
0.27
±
±
±
±
±
±
±
33
0.47
0.19
0.26
0.12
2 0.02
0.28
0.18
2 0.01
0.02
0.22
0.36
0.20
0.22
0.24
2 0.08
0.11
2 0.47
2 0.09
0.27
0.20
0.17
0.07
0.45
0.22
2 0.19
0.15
2 0.16
2 0.67
0.63
0.24
2 0.22
0.19
0.34
±
±
±
±
±
±
34
0.79
0.09
0.26
0.23
0.05
0.10
0.18
2 0.20
2 0.23
0.20
0.23
2 0.04
0.16
2 0.03
2 0.21
0.05
2 0.26
2 0.04
0.01
0.11
0.12
0.18
0.11
0.11
2 0.28
0.09
0.06
2 0.56
0.04
0.01
2 0.27
0.04
0.14
0.45
±
±
±
±
±
35
0.22
2 0.03
0.42
0.24
0.22
0.28
0.13
2 0.15
0.06
0.28
0.22
0.01
0.13
0.13
2 0.11
0.17
2 0.39
2 0.10
0.31
0.09
0.31
0.29
0.05
0.37
2 0.13
0.36
0.03
2 0.61
0.08
0.24
2 0.13
0.29
0.26
0.27
0.22
±
±
±
±
36
0.36
0.16
0.45
0.63
0.38
0.04
0.12
2 0.13
0.06
0.35
0.26
2 0.03
0.39
0.07
2 0.11
0.14
2 0.12
2 0.02
0.18
0.27
0.35
0.35
2 0.06
0.20
2 0.37
0.23
0.37
2 0.39
2 0.04
0.12
2 0.33
0.20
0.15
0.20
0.32
0.30
±
±
±
37
2 0.10
0.03
0.12
0.19
0
0.12
0.09
0.09
2 0.29
0.10
0.11
0.24
0.19
0.01
2 0.03
0.20
2 0.17
0.20
0.02
0.23
0.12
0.43
2 0.07
0.43
2 0.39
2 0.05
0.04
0.01
2 0.04
0.60
2 0.26
2 0.01
0.43
0
2 0.01
2 0.06
0.08
±
±
38
0.20
0.11
0.02
0.06
2 0.23
0.41
2 0.03
2 0.12
0
0.17
0.36
0.07
0.22
0.16
2 0.23
0.28
2 0.50
2 0.43
0.01
0.05
0.05
0.08
0.02
0.17
2 0.06
0.31
2 0.28
2 0.82
2 0.13
0.12
2 0.14
0.13
0.18
0.15
0.10
0.07
0.03
2 0.10
±
39
0.12
0.03
2 0.01
2 0.14
2 0.27
0.53
2 0.06
2 0.27
2 0.15
2 0.03
0.25
2 0.01
0.02
0.24
2 0.21
0.12
2 0.58
2 0.54
2 0.01
0.09
0
0.01
0.05
0.06
0.21
0.25
2 0.41
2 0.83
2 0.12
0.02
2 0.33
0.10
0.18
0.22
0.10
0.10
2 0.16
2 0.10
0.48
40
E.N. Trifonov / Gene 261 (2000) 139±151
145
146
E.N. Trifonov / Gene 261 (2000) 139±151
2.3. Chronology of codons
Table 5
Stability of the ranking
Number of criteria (simple
averaging)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Filtered
3
7
25
28
40
One step
Two steps
G
A
S
D
P
T
V
L
I
K
N
E
C
M
H
F
Q
R
Y
W
A
G
S
P
V
T
L
D
I
E
N
F
K
R
Q
C
H
M
W
Y
G
A
D
V
P
S
E
L
T
I
N
F
H
K
R
Q
C
M
Y
W
G
A
V
D
S
P
E
L
T
I
N
R
F
K
Q
H
C
M
Y
W
G
A
V
D
S
E
P
L
T
I
N
R
K
Q
C
F
H
M
Y
W
G
A
V
D
S
E
P
L
T
N
R
K
I
Q
H
C
F
M
Y
W
G
A
V
D
E
P
S
L
T
R
N
K
Q
I
C
H
F
M
Y
W
that the derived chronology is robust, and neither addition of
new criteria, nor possible additional ®ltering steps would
result in any appreciable changes.
2.2.2.4. Relation to experiments by S. Miller. The most
important reading from the derived consensus chronology
of amino acids is that the consensus places at least nine of
ten amino acids detected in the initial experiment of Miller
(1953) to the top of the list. This can not be explained by
possible domination of criteria related to Miller's data, since
this is taken care of by combining the Miller-related criteria
in just one, No. 3* (to replace criteria No. 3, 10, 20 and 21).
Even if the combined ranking No. 3* is excluded from
calculations, the result stays the same, with only little
change of order within the top nine (see Table 3). Since
uncertainties in calculation of the averaged ranks are
rather high, the isoleucine and arginine are within
respective error bars indistinguishable and, thus, T in fact
may well be immediately followed by I which would place
all ten amino acids of Miller to the top.
This observation, ®rst, demonstrates that the signi®cance
of the Miller's results goes well beyond mere demonstration
of possibility of abiotic synthesis of amino acids. The amino
acids of the Miller's mix appear also to be the very ®rst ones
to be accommodated in the evolving triplet code. On the
other hand, this observation makes the derived consensus
chronology highly relevant apparently re¯ecting, indeed,
the order of events in early evolution of the code. This
becomes even more evident when the reconstruction of
the chronology of codons is attempted.
2.3.1. Initial reconstruction, 20 codon pairs
As originally suggested by Eigen and Schuster (1978),
glycine and alanine could have been the very ®rst amino
acids, being the highest yield components of the imitated
primordial mixture of Miller (1953). They also speculated
that, perhaps, the ®rst codons for glycine and alanine had
been GGC and GCC, respectively, making a very stable
complementary combination. This is, indeed, the highest
stability pair of the whole list of 32 combinations. Table 2
lists all the codons and stabilities of the respective complementary triplet pairs, expressed in melting enthalpies. The
initial enthalpy values for individual base-pair stacks used
for the derivation of the data in the Table are taken from the
latest estimates for double-stranded RNA (Xia et al., 1998).
Encouragingly, the next two amino acids in the chronological list are valine and aspartic acid for which, similarly, a
complementary pair of codons can be chosen from the
known repertoire of the codons for these amino acids:
GUC and GAC. Both are the most stable triplets in the
repertoires. (Here and below stability of triplets means thermostability of respective complementary pairs). It, thus,
appears that the original suggestions by Eigen and Schuster,
on the primacy of the thermostability, and on acquisition of
new codons in form of complementary pairs of the codons,
could have been rules applicable for the development of the
entire code. One of corollaries of the codon complementarity rule, as also speculated in the cited work, would be
possible complementarity of respective ancient tRNAs.
This, indeed, has been con®rmed by sequence analysis of
a large ensemble of tRNA sequences (Rodin et al., 1993,
1996).
To check whether the rules of thermostability and
complementarity are valid also for the rest of the consensus
chronology, in Fig. 2 the most stable codons (bold face) for
all 20 amino acids are displayed in one diagram with their
respective amino acids. There is uncertainty in assignment
of temporal order for amino acids encoded by two sets of
triplets ± serine, leucine and arginine. One of each pair
could appear after the ®rst one at any later moment in the
chronology. For an initial guess they are put here together,
as contemporaries. As Fig. 2 demonstrates, sequential
acquisition of all amino acids according to the consensus
chronology perfectly obeys (with only two negotiable
exceptions) the thermostability and complementarity rules.
For every strong triplet on the diagonal there is complementary one that belongs to the codon repertoire of one of the
amino acids acquired earlier. This is re¯ected in the
conspicuous above-diagonal positioning of almost all the
complementary wobble triplets. First four amino acids
discussed above are followed by the glutamic acid that is
one of the mentioned apparent exceptions. Its stable triplet
GAG has a complement CUC, the most stable triplet for
leucine, that appears nominally only after proline and
serine. However, considering the accuracy of the ranking
E.N. Trifonov / Gene 261 (2000) 139±151
147
Fig. 2. Reconstructed chronology of 20 codon pairs. The triplets corresponding to the most stable ones in the respective repertoires are shown boldface.
(see Table 3) the order E after P and S is equally acceptable,
in which case E and L become next to each other, thus,
putting the GAG and CUC triplets together, as the two
rules require. The stablest codon CCC for proline is complementary to GGG for glycine, that came earlier. That is, the
wobble version GGG of the glycine codon GGC appears
simultaneously with its complementary counterpart CCC.
Next in the chronology is serine which is encoded by two
different sets of codons, UCX and AGY. The triplet UCC of
the ®rst set is stablest and it is complementary to GGA of
glycine that is already present. The AGC triplet of the
second set obeys the rules as well, being complementary
to GCU, for alanine that is also already present in the developing chronology. As to the leucine encoded by UUG (this
is second apparent exception), to satisfy the rules it only has
to be put somewhere after glutamine, to make the complementary triplet CAA readily available. All subsequent stable
codons appear simultaneously with complementary codons
derived by wobble mutations from the above-diagonal
triplets for the amino acids acquired at earlier steps of the
chronology. Thus, suggestion of Eigen and Schuster is
solidly con®rmed being fully applicable not only to ala
and gly, but to all amino acids of the consensus chronology.
The third strict rule, the rule of processivity, can be formulated as well: new codons appear as derivatives of chronologically earlier codons. Majority of them are wobble
mutations of the earlier codons, and their complements.
Only in one case (from G/A to V/D) the new codons are
generated, apparently, by transitions in the second codon
positions or one transition (any of two possible ones) and
one complementary copying step. Importantly, the processivity is due to the very special order of amino acids offered
by the consensus chronology of amino acids. For example,
putting W or R few ranks before P, F before E, or K before
L, etc. ± would destroy the triangular pattern.
Within the emerging picture of development of the code
the starters GGC and GCC play unique role. Their origin is
beyond the scheme and is more pertinent to the problem of
the origin of life.
2.3.2. All 32 codon pairs
The thermostability rule holds that in every repertoire of
codons for a given amino acid the most stable triplet is
engaged ®rst. Although there is also general decline of the
thermostability towards the end of the codon chronology it
is not strictly monotonous. For example, codon ACC for
threonine is more stable than GUC for valine though valine
comes earlier. There are, obviously, additional selection
pressures other than just thermostability. This particular
case of non-monotony has a simple explanation. All 64
triplets can be divided in two families, one having G or C
in the central position, and another ± A or U in the center.
There is no way to derive A/U central triplets from G/C
central triplets by a wobble mutation. The GUC/GAC
(Val/Asp) pair is the most stable in the A/U central family.
It is, apparently, formed by transition mutation(s) from
GGC/GCC (Gly/Ala) pair. These both pairs occupy corner
position, thus, supplying wobble and complementary
versions corresponding to all remaining amino acids.
Considering an a priori importance of the thermostability
one would also expect that within any given codon repertoire not only the ®rst codon is chosen according to its
stability, but second and third codons as well. Fig. 2 does
not show it, perhaps, because the locations for second codon
sets of serine, leucine and arginine are arbitrarily chosen
here as immediately next to the ®rst sets. They should be
placed at some later position, like it is already suggested by
apparent misplacement of the UUG triplet for leucine to
begin with. Indeed, in the succession GCC, GCU, GCG,
GCA for alanine (Fig. 2) the GCU codon complementary
Fig. 3. Reconstructed chronology of 32 codon pairs.
148
E.N. Trifonov / Gene 261 (2000) 139±151
E.N. Trifonov / Gene 261 (2000) 139±151
to AGC triplet for serine is less stable than GCG codon. By
placing Sagy after Rcgx in the amino acid chronology one gets
the monotonously descending order in stability of sequentially acquired alanine codons: GCC, GCG, GCU, GCA (see
Table 2). Thus, by adjustment of the ¯exible positions for
the double sets of serine, leucine and arginine one may
attempt to organize the codon chronology to follow the
extended thermostability rule: not only the ®rst stablest,
but second and third as well, consecutively. An additional
¯exibility is permitted by uncertainties of the average ranking values (Table 3). For example, D, E, P and S are indistinguishable in this respect, as well as the amino acids in the
groups (N, K), (Q, I), and (H, F, M, Y). In the complete
reconstruction of the codon chronology shown in Fig. 3 only
one such allowed change in the amino-acid order is made, as
already mentioned above: the (E, P, S) sequence is changed
to (P, S, E). As a result, the derived complete codon chronology now strictly follows the rules of complementarity and
processivity, and with only two violations ± the extended
rule of thermostability.
The reader is invited to trace in detail the order of engagement of the codons in every repertoire (line) of Fig. 3 and
compare it with the thermostability orders given in Table 2.
The exceptions are the GUU codon for valine and CUU
codon for leucine. According to their thermostability they
should be at the end of the lines, after GUA and CUA,
respectively. One possible explanation of the exceptions is
that, probably, initial codon sets for valine and leucine were
doublets GUY and CUY, rather than quartets, and the additional doublets, GUR and CUR were acquired at some later
stage. In the present-day codon table the sets in form of
ABY doublets are as common as quartets: UUY for phe,
UAY for tyr, UGY for cys, CAY for his, AAY for arg, AGY
for ser and GAY for asp.
With this comment the chronology of the codons
displayed in Fig. 3 becomes fully consistent with the rules
of extended thermostability, complementarity and processivity, while the amino-acid chronology (on the left) stays
consistent with the calculated order in Table 3, within
respective error bars (for E, P and S).
There are still many uncertainties in ®ne details of the
chronologies that may or may not be clari®ed by further
studies. Few additional criteria may emerge. However,
merely due to large number of criteria contributing to the
consensus picture, the new data could only change the rank
averages for amino acids by fractions of unit, keeping the
temporal orders essentially unchanged.
3. Discussion
The unique merit of the derived chronologies is that they
are not built on any a priori assumptions but rather combine
in as objective as possible way all ideas and suggestions
regarding the order of acquisition of the amino acids during
the evolution of the code. Four major and in many ways
149
most natural discoveries are made:
1. The ®rst amino acids to have been incorporated in early
code were of abiotic origin, namely those which were
obtained in classical imitation experiments by S. Miller.
2. In the development of the triplet code a major role was
played by thermostability of codon-anticodon interactions.
3. New codons appeared in complementary pairs.
4. New codons were simple derivatives of chronologically
earlier ones.
None of 40 individual criteria of chronology of amino
acids alone would suggest these four fundamental conclusions. Since both amino-acid and codon chronologies
derived as above reveal in the previously unexplored way
the new basic knowledge, they, apparently, better represent
true chronology of events than any earlier suggestions. It
will be hard to challenge as fundamental and as obvious
features as the four above, which are quite likely, thus, to
stay.
The author has to admit that the spectacular order that
came out of the disorder of very different, noisy and
frequently rather questionable criteria is a great surprise.
One could only hope, that most of the major factors that
in¯uenced the development of the code, are already present
in the list of 40 criteria, that the noise level in the rankings
by the individual criteria is, in average, not too high, and
that the evolution of the code was not geared to a single
dominating factor. Apparently, these expectations are met,
indeed.
The robustness of the derived consensus ranking of amino
acids is additionally demonstrated by the fact that even if the
chronology of amino acids based on raw data (no ®ltering) is
considered, the four rules strictly hold as well (data not
shown), despite some differences in the succession of the
amino acids (see Table 3). This also says that some uncertainty in the details of the chronologies still remains, even
within the rather strict limits of the four rules. Further re®nement of the orders will be, perhaps, possible after analysis of
reconstructed ancient sequences of proteins and respective
mRNAs.
An attempt to build the chronological orders on the basis
of the four rules only, without taking into consideration all
the numerous criteria, allows to outline the limits of the
remaining uncertainties. It turns out that the ®rst nine
amino acids make a rather unique order: G/
A . P . S . T, and G/A . V/D . E/L (sign . means
here `earlier'), with no other uncertainties. The remaining
amino acids can be arranged in many possible combinations, all obeying the rules. The restrictions may be
expressed in the following inequalities: Rcgx . Sagy . C,
N . H . Y, Ragr . W, K . Q . Luur, and H . M. These
inequalities are valid for many alternative temporal orders.
For example, to keep with the order GCC .
GCG . GCU . GCA, Sagy may be placed anywhere after
150
E.N. Trifonov / Gene 261 (2000) 139±151
R(CGC) but before C(UGC), see Fig. 3. For the time being
such uncertain parts of the chronologies may be arranged by
satisfying best the overall descend of the thermostability, as
it is done in Fig. 3. Accordingly, the best guess on the amino
acid chronology, consistent with the consensus and adjusted
as allowed to ®t the four rules is:G/A, V/D, P, S, E/L, T, R,
N, K, Q, I, C, H, F, M, Y, W.
The thermostability rule is readily interpretable. By using
the enthalpy values for deoxyribonucleotides instead of
ribonucleotides one gets consistent picture as well (data
not shown), due to largely the same order of descending
thermostabilities of the DNA triplets. The question then is
stability of what was so crucial in the evolution of the code.
It could be stability of protein-coding DNA or RNA
duplexes, or stability of codon-anticodon interactions in
the early translation apparatus. One could imagine also
codon-anticodon-like interactions between respective
tRNAs, perhaps in prototype amino-acyl-tRNA synthetase
complexes. RNA triplet interactions of the latter two
suggestions seem to be better choice, being more dependent
on the complementary triplet pair stabilities, then the
duplexes. During the development of the code each time
when the new codon is offered by the wobble mutation,
other, next in the row mutation(s) of the same repertoire
is, probably, present as well. The one that makes the most
stable complementary pair is, apparently, selected.
Interestingly, both initiation codons and stop codons
appear at the end of the derived codon chronology. It may
correspond to the point of transition between progenote
stage and ®rst branchings in the basic tree of life all
branches of which possess the initiation and termination
codons. These codons could also appear at earlier stages.
For example, initiation codon may have been introduced to
exclude second strand of the early `mRNA duplex' from
translation. The termination could have been introduced to
prevent further extensions of the protein chains by end-toend fusion of their respective genes.
Intermediate stages of the development of the codon table
leave many triplets unassigned. Perhaps, every appearance
of such unassigned triplets would be lethal at these stages (if
not at the end of the message). On the other hand, there
could have been some precursor amino acids to be served
by such triplets, e. g., in accordance with Wong's co-evolution theory (Wong, 1981, 1988). The reconstruction
described in this work did not require such assumption.
This does not exclude, however, some important undisclosed detours in the codon chronology that may be discovered in future studies.
Acknowledgements
Discussions with Drs I. Berezovsky, T. Bettecken, A.
Bolshoy, S. Botti, M. Di Giulio, A. Evdokimov, A. Girshovich, H. Hartman, N. Lahav, A. Litovchik, V. Ivanov, G.
Malenkov, S. Ohno, S. Rodin, M. Safro, N. Sueoka, E.
Sverdlov and E. Zuckerkandl are highly appreciated.
References
Altshtein, A.D., E®mov, A.V., 1988. Physicochemical basis of origin of the
genetic code: stereochemical analysis of interaction of amino acids and
nucleotides on the basis of the progene hypothesis. Mol. Biol. 22, 1133±
1149.
Arques, D.G., Michel, C.J., 1996. A complementary circular code in the
protein coding genes. J. Theor. biol. 182, 45±58.
Bar-Nun, A., Bar-Nun, N., Bauer, S.H., Sagan, C., 1970. Shock synthesis of
amino acids in simulated primitive environments. Science 168, 470±
473.
Chaley, M.B., Korotkov, E.V., Phoenix, D.A., 1999. Relationships among
isoacceptor tRNAs seem to support the co-evolution theory of the origin
of the genetic code. J. Mol. Evol. 48, 168±177.
Crick, F.H.C., Brenner, S., Klug, A., Pieczenik, C., 1976. A speculation on
the origin of protein synthesis. Origins Life 7, 389±397.
Davis, B.K., 1999. Evolution of the genetic code. Prog. Biophys. Mol. Biol.
72, 157±243.
Dillon, L.S., 1978. The Genetic Mechanism and the Origin of Life. Plenum
Press, New York.
Dufton, M.J., 1997. Genetic code synonym quotas and amino acid
complexity: cutting the cost of proteins? J. Theor. Biol. 187, 165±173.
Eigen, M., Schuster, P., 1978. The hypercycle. A principle of natural selforganization. Part C: The realistic hypercycle. Naturwissenschaften 65,
341±369.
Eigen, M., Winkler-Oswatitsch, R., 1981a. Transfer-RNA: The early adaptor. Naturwissenschaften 68, 217±228.
Eigen, M., Winkler-Oswatitsch, R., 1981b. Transfer-RNA, an early gene?
Naturwissenschaften 68, 282±292.
Eigen, M., Gardiner, W., Schuster, P., Winkler-Oswatitsch, R., 1981. The
origin of genetic information. Sci. Am. 244, 88±118.
Eriani, G., Delarue, M., Poch, O., Gangloff, J., Moras, D., 1990. Partition of
tRNA synthetases into two classes based on mutually exclusive sets of
sequence motifs. Nature 347, 203±206.
Ferreira, R., Coutinho, K.R., 1993. Simulation studies of self-replicating
oligoribotides, with a proposal for the transition to a peptide-assisted
stage. J. Theor. Biol. 164, 291±305.
Fox, S.W., Waehneldt, T.V., 1968. The thermal synthesis of neutral and
basic proteinoids. Bioch. Bioph. Acta 160, 246±249.
Fox, S.W., Windsor, C.R., 1970. Synthesis of amino acids by heating of
formaldehyde and ammonia. Science 170, 984±986.
Gotoh, O., 1983. Prediction of melting pro®les and local helix stability for
sequenced DNA. Adv. Biophys. 16, 1±52.
Harada, K., Fox, S.W., 1964. Thermal synthesis of natural amino acids
from a postulated primitive terrestrial atmosphere. Nature 201, 335±
336.
Hartman, H., 1975a. Speculations on the evolution of the genetic code.
Origins of Life 6, 423±427.
Hartman, H., 1975b. Speculations on the origin and evolution of metabolism. J. Mol. Evol. 4, 359±370.
Hartman, H., 1978. Speculations on the evolution of the genetic code II.
Origins of Life 9, 133±136.
Hartman, H., 1995. Speculations on the origin of the genetic code. J. Mol.
Evol. 40, 541±544.
Henikoff, S., Henikoff, J.G., 1992. Amino acid substitution matrices for
protein blocks. Proc. Natl. Acad. Sci. 89, 10915±10919.
Hornos, J.E.M., Hornos, Y.M.M., 1993. Algebraic model for the evolution
of the genetic code. Phys. Rev. Lett. 71, 4401±4404.
Ivanov, O.C., 1989. On the possible evolution of genetic code: the composition of primitive proteins. Stud. Biophys. 134, 201±214.
Jimenez-Montano, M.A., 1999. Protein evolution drives the evolution of
the genetic code and vice versa. BioSystems 54, 47±64.
E.N. Trifonov / Gene 261 (2000) 139±151
Jimenez-Sanchez, A., 1995. On the origin and evolution of the genetic
code. J. Mol. Evol. 41, 712±716.
Jukes, T.H., 1973. Possibilities for the evolution of the genetic code from a
preceding form. Nature 246, 22±26.
Kvenvolden, K.A., Lawless, J.G., Ponnamperuma, C., 1971. Nonprotein
amino acids in the Murchison meteorite. Proc. Natl. Acad. Sci. USA
68, 486±490.
È ber das Verhalten des Formamids unter der Wirkung der
LoÈb, W., 1913. U
stillen Entladung: Ein Betrag zur Frage der Stickstoff-Assimilation.
Ber. Dtsch. Chem. Ges. 46, 684±697.
Miller, S.L., 1953. Production of amino acids under possible primitive earth
conditions. Science 117, 528±529.
MoÈller, W., Janssen, G.M.C., 1990. Transfer RNAs for primordial amino
acids contain remnants of a primitive code at position 3 to 5. Biochimie
72, 361±368.
Nelsestuen, G.L., 1978. Amino-acid directed nucleic acid synthesis. A
possible mechanism in the origin of life. J. Mol. Evol. 11, 109±120.
Osawa, S., Jukes, T.S., Watanabe, K., Muto, A., 1992. Recent evidence for
evolution of the genetic code. Microbiol. Rev. 56, 229±264.
Papentin, F., 1982. On order and complexity II. Application to chemical
and biochemical structures. J. Theor. Biol. 95, 225±245.
Ribas de Pouplana, L., Frugier, M., Quinn, C.L., Schimmel, P., 1996.
Evidence that two present-day components needed for the genetic
code appeared after nucleated cells separated from eubacteria. Proc.
Natl. Acad. Sci. USA 93, 166±170.
Riddle, D.S., Santiago, J.V., Bray-Hall, S.T., Doshi, N., Grantcharova,
V.P., Yi, Q., Baker, D., 1997. Functional rapidly folding proteins
from simpli®ed amino acid sequences. Nature Struct. Biol. 4, 805±
809.
Rodin, S., Ohno, S., Rodin, A., 1993. Transfer RNAs with complementary
151
anticodons: could they re¯ect early evolution of discriminative genetic
code adapters? Proc. Natl. Acad. Sci. USA 90, 4723±4727.
Rodin, S., Rodin, A., Ohno, S., 1996. The presence of codon-anticodon
pairs in the acceptor stem of tRNAs. Proc. Natl. Acad. Sci. USA 93,
4537±4542.
Taylor, F.J.R., Coates, D., 1989. The code within the codons. BioSystems
22, 177±187.
Trifonov, E.N., 1999a. Elucidating sequence codes: three codes for evolution. Ann. NY Acad. Sci. 870, 330±338.
Trifonov, E.N., 1999b. Glycine clock: Eubacteria ®rst Archaea next, Protoctista, Fungi, Planta and Animalia at last. Gene Therapy Mol. Biol. 4,
313±322.
Trifonov, E.N., 2000. Leap into life's beginnings tracking the chronology
of amino acids. Sci. Spectra 20, 62±71.
Trifonov, E.N., Bettecken, T., 1997. Sequence fossils, triplet expansion,
and reconstruction of earliest codons. Gene 205, 1±6.
Weber, A.L., Miller, S.L., 1981. Reasons for the occurrence of the twenty
coded protein amino acids. J. Mol. Evol. 17, 273±284.
Wong, J.T.-F., 1981. Co-evolution of genetic code and amino acid
biosynthesis. Trends Bioch. Sci. 6, 33±36.
Wong, J.T.-F., 1988. Evolution of the genetic code. Microbiol. Sci. 5, 174±
181.
WaÈchtershaÈuser, G., 1988. Before enzymes and templates: theory of surface
metabolism. Microbiol. Rev. 52, 452±484.
Xia, T., SantaLucia, J., Burkard, M.E., Kierzek, R., Schroeder, S.J., Jiao,
X., Cox, C., Turner, D.H., 1998. Thermodinamical parameters for an
expanded nearest-neighbor model for formation of RNA duplexes with
Watson-Crick base pairs. Biochemistry 37, 14719±14735.
Yockey, H.P., LoÈb, Walther, Stanley, L., 1997. Miller and prebiotic `building blocks' in the silent electrical discharge. Persp. Biol. Med. 41, 125±
131.