Download Readings Problems Background Week 9

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microevolution wikipedia , lookup

Mutation wikipedia , lookup

Transfer RNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Frameshift mutation wikipedia , lookup

Point mutation wikipedia , lookup

Expanded genetic code wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
MCB 142
MAJOR ADVANCES IN UNDERSTANDING EVOLUTION AND HEREDITY
FALL 2015
WEEK 9: NOVEMBER 3 AND 5
NOVEMBER 3: READING THE GENETIC CODE
Acridine mutagenesis. Fixed reading frame and frame shifts. Nonsense mutations.
NOVEMBER 5: CRACKING THE CODE
Assigning nucleotide triplets to amino acids. Structure of the code. Evolution of the code.
Reading to be Discussed Tuesday November 3

F.H.C. Crick, Leslie Barnett, S. Brenner and R.J. Watts-Tobin (1961) General nature of the
genetic code for proteins. Nature 192: 1227-1232.
Read the article.
Readings to be Discussed Thursday November 5

Marshall Nirenberg and Philip Leder (1964) RNA codewords and protein synthesis. Science
145: 1399-1407.
Read the article.

F.H.C. Crick (1968) The origin of the genetic code. Journal of Molecular Biology 38: 367379.
Read the article.
Study Questions
Please hand in Tuesday November 3
1. How did Crick et al. (1961) obtain the mutation FC0? How were the other mutations on the
top line of Figure 2 obtained? How were the mutations other than FC1 in the second line of
Figure 2 obtained?
2. In the part of the rIIB region shown in figure 2, phages with any combination of a (+)
mutation on the right and a (-) mutation on the left make plaques on E. coli K. In contrast,
the only pairs of mutations with a (+) mutation on the left of a (-) mutation that give phages
able to plate on E. coli K span only the relatively short segment of the map that lies between
FCO and the cluster at FC18 and FC21. How can these observations be explained? What
phenotype would you expect the double mutant FC7 FC58 to have when plated on E. coli K?
What about the double mutant FC72 FC87?
3. Growth in 5-bromouracil (5BU) increases the frequency of transition mutations, AT to GC
and also GC to AT. Suppose that the double mutant FC0 FC87 is constructed (by
-1-
appropriate crosses) and that two stocks of it are prepared, one by growth on E. coli B in the
presence of 5BU and one grown on B without 5BU. No plaques are found when large
numbers of phages from the non-5BU stock are plated on K. In contrast, several plaques
appear when large numbers of phages from the 5BU stock are plated on K. Why were no
plaques found in the stock grown without 5BU? Explain the production of phages able to
plate on K in the stock grown with 5BU.
4. What is the evidence that the FC region is dispensable for rII B function?
5. What do Crick et al. conclude regarding the coding ratio? What observation do they make
that supports their conclusion?
6. What do Crick et al. conclude regarding the redundancy of the code? What argument do
they give for this conclusion?
7. What are the principal components of the triplet assay system of Nirenberg and Leder? What
are the components of the complex that is retained on the cellulose acetate (Millipore) filter?
What particular member of the complex binds directly to the filter?
8. What evidence is there that the binding of a polyribonucleotide (or a triplet) to the anti-codon
of an aminoacyl tRNA is not sufficient by itself to account for the stability of the specific
polyribonucleotide-tRNA-ribosome complex? What other interactions might contribute to
the binding?
9. In the standard code, there are 61 codons that code for amino acids and three that code for
STOP but most organisms have fewer than 45 different kinds of tRNA. What does this
imply about the interaction between tRNAs and the nucleotide triplets they recognize in
mRNA?
Some Background
General Nature of the Code
The 1961 paper by Crick, Barnett, Brenner and Watts-Tobin General Nature of the Genetic
Code for Proteins is arguably second only to Mendel's Experiments in Plant Hybridization in
elegance and fundamental genetic importance. By means of purely genetic studies of a
proflavin-induced mutation in a portion of the rIIB cistron of bacteriophage T4 and its
spontaneously-occurring suppressors (and their suppressors and suppressors of these), Crick et
al. showed that the genetic code is composed of nucleotide triplets (or, less likely, sixtuplets)
read sequentially from a fixed starting point and also showed that the code is degenerate. (Recall
that a suppressor mutation is one that reverses or partly reverses the mutant phenotype of another
mutation. Thus, a suppressor of an rIIB mutation of phage T4 makes the rIIB mutant able to
grow on E. coli strain K.)
-2-
Starting that same year, 1961, biochemical experiments began to assign particular nucleotide
sequences and then particular nucleotide triplets to particular amino acids and, within little more
than five years, the code was entirely deciphered and seen to be the same in a wide variety of
species.
Even before the structure of DNA was discovered, with its self-evident capacity for
carrying information for amino acid sequence in the form of its nucleotide sequence,
the idea was proposed by Alexander Dounce in 1952 (Enzymologia 15: 251-258)
that the linear sequence of nucleotides in DNA specifies the linear sequence of amino
acids in proteins and that the two sequences are co-linear. This is what Crick called
the "sequence hypothesis". An earlier hypothesis was that proteins were assembled by
the joining of small peptides into chains under the guidance of hypothesized "guide"
enzymes. But since enzymes themselves are chains of amino acids, this idea,
although entertained by some, was obviously flawed, as it offered no solution to the
problem of what specified the guide enzymes!
Alexander Dounce
1909-1997
Quite aside from questions of molecular mechanism, the template concept raises the abstract
problem of coding: How is a linear message written in four different characters (the DNA
nucleotides, ATGC) decoded into a linear message written in twenty different characters (the
standard amino acids)? The general problem of decoding raises a number of questions:
a) Is the DNA sequence co-linear with the sequenced of amino acids in the frotein for which it
codes?
b) If the DNA sequence is co-linear with the amino acid sequence, is there a fixed coding ratio
(the number of nucleotides that specify an amino acid)? If so, what is it?
c) How are "words" for amino acids in the DNA code recognized? Are they separated by some
form of punctuation (“commas”) that serves the role played by spaces in written English? Or is
the code “comma-free”, such that only codons that specify amino acids are recognized by the
reading machinery. An example from a "comma-free" message with a coding ratio of 3 is
...CATCATCATCATCATCAT... Here only the word CAT is recognized (by us) because we do
not recognize the other two triplets, ATC and TCA as meaningful. Or is the code read from a
fixed start point in groups of some fixed number of nucleotides? And if so, how many
nucleotides are in a group? Remarkably, this last possibility appears not to have been considered
until it was discovered by Crick et al. (1961)
d) What is the code? Which particular nucleotide words correspond to which particular amino
acids? What are the signals for "start reading" and for "stop reading"?
e) If the code is degenerate, what determines which code words are used for a given amino acid
in different situations, different cells, different genes or different species?
f) Is the code universal or do some organisms utilize codes that are different from the code in
others?
-3-
g) What features might the code have that provide stability against mutations and reading errors?
h) How did the code evolve? What, if anything, can the code tell us about the origin(s) of life?
i) Are there lineages in which the code continues to evolve?
Co-linearity
Despite some pessimistic speculation by Max Delbruck in the 1950s that the genetic map might
not be co-linear with the DNA sequence, non co-linear codes were never seriously considered.
Even before there was experimental proof that the genetic map is co-linear with the amino acid
sequence (Sarabhai et al. 1964 Nature 201: 13-17; Yanofsky et al. 1964 PNAS 51: 266-272), it
was generally believed that the nucleotide and amino acid sequences are co-linear. The expected
final proof of the co-linearity of the nucleotide sequence and the corresponding amino acid
sequence, came only later, with the advent of protein and DNA sequencing.
Coding ratio
It was assumed that there must be a fixed number of nucleotides to specify an amino acid and
that the number had to be greater than two. A coding ratio of two would allow for only 16
different amino acids, not enough to separately code for each of the 20 different amino acids of
general occurrence in proteins. Starting with Dounce (1952), it was commonly thought likely
that the coding ratio is 3, allowing for 64 different nucleotide triplets and implying that the code
is redundant (several different triplets corresponding to a given amino acid, as is in fact the case)
or that some triplets simply do not occur or play no role (which is not the case). The group of
nucleotides needed to specify an amino acid was named a “codon” by Crick. The 1961 paper of
Crick et al. showed that the coding ratio is three or, improbably, an integral multiple of three.
Codon recognition.
How are individual codons recognized within the polynucleotide message? There are several
ways that might be considered for accomplishing this.
Overlapping codes. One way is to avoid the problem of codon recognition altogether. This can
be done with a fully overlapping code. In an overlapping triplet code, for example, every
trinucleotide would be a codon. Thus, the DNA sequence ATCG would contain two codons,
ATC and TCG. It is seen that in such an overlapping code, the first two nucleotides in a codon
must be the same as the last two nucleotides in the preceding codon, thereby imposing
restrictions on which codons may be adjacent to which other codons and therefore imposing
corresponding restrictions on which amino acids may be adjacent to which other amino acids.
For example, with 20 different amino acids there would be 400 (20x20) possible dipeptides
along a protein chain if all dipeptides are possible--as could be the case in a non-overlapping
code. But in a fully overlapping triplet code a dipeptide is specified by a tetranucleotide, of
which there are only 256 (4x4x4x4). Finding more than 256 different dipeptides would therefore
rule out a fully overlapping triplet code. Examining the several protein amino acid sequences
known in 1957, derived from several different species, and making the assumption that the code
is universal, Brenner (PNAS 43: 687-694) showed by such arguments that no overlapping triplet
code is possible.
-4-
Another sort of evidence that the code is not overlapping began to appear, as we have read, with
the finding of Ingram, also in 1957 (Nature 180: 326-328), that a single mutation changes only a
single amino acid residue in a peptide chain, as, for example, replacing the glutamic acid at
position six of the beta chain of hemoglobin A with valine in hemoglobin S or with leucine in
Hemoglobin C (a rare Hb variant found in humans). Also, by 1960, many mutants of tobacco
mosaic virus were found to result in single amino acid substitutions in the capsid protein of the
virus.
With overlapping codes eliminated, three other possibilities for recognizing codons remained to
be considered. This was the situation in 1961, when the experiments reported in Crick et al. 1961
were begun. As mentioned above, the third of these appears not to have been thought of until it
was implied by the experimental results.)
Codes with commas. Codons could be set off by commas. For example, one might imagine that
G is used for such punctuation, leaving only A, T, and C to make code words. Then, taking a
triplet code, there could be 27 (3x3x3) different codons, more than adequate to specify each of
the 20 standard amino acids.
Comma-free codes. The code could be comma-free if only certain triplets are recognizable and
all the triplets that overlap two such "sense" triplets are "nonsense" triplets, not recognizable by
the decoding apparatus. An example would be the message CATCATCATCAT. If, as in written
English, CAT but not ATC or CTA were recognizable words (words in the dictionary), the
message would unambiguously convey information for the sequence CAT CAT CAT. Not being
of the overlapping type, such a code would allow any amino acid to be adjacent to any other
amino acid. It was shown by Crick, Griffith and Orgel (PNAS 43: 416-421, 1957) that the
maximum number of "sense" triplets that could be allowed in such a comma-free triplet code is
20, just the number of different generally occurring amino acids! Then it could be imagined that
the information for initiation and termination of the peptide chain could be specified by special
nucleotide sequences longer than triplets. But the equality between the maximum number of
"sense" codons in a coma-free triplet code and the number of amino acids was only coincidence,
one of Nature's deceptions.
Codes with fixed reading frames. As marvelously shown to be the case by Crick et al. 1961, the
code is read from a fixed starting point, the existence of which establishes a "reading frame" for
the correct sequential reading of triplet (or, less likely, sextuplet, etc.) codons. The starting
point for their experiments was the assumption, supported by considerable evidence, that the
mutations arising during phage growth in the presence of acridines (planar heterocyclic
molecules of which proflavin is an example) correspond to additions and deletions of individual
DNA base pairs. (Note that additions could be expected to result during DNA replication when
the acridine slips between bases in the DNA template chain and deletions when it slips between
bases in the growing chain.)
Proflavin
-5-
The underlying logic of the experiments is that if the code is read from a fixed starting point, an
addition (or deletion) of a base pair will shift the "reading frame" by one base pair, shifting the
reading frame from there all the way to the end of the coding sequence. But if an addition is
followed by a deletion in the same gene, (or vice versa) the reading frame will be shifted only
between the two mutations but will be restored to the correct "phase" beyond the distal mutation.
While such addition and deletion mutants can be abundantly produced by growth in the presence
of an acridine, as was the case for FC0, they also occur spontaneously. Except for the initial
mutation, FC0, all the frame-shift mutations cited in Crick et al. occurred spontaneously.
If the protein with the amino acid sequence specified within the shifted region is at least partly
functional, the double mutant will be at least pseudo wild-type. In that case, the two mutations
may be considered "suppressors" of each other. A mutation that suppresses an initial frame shift
mutation that, like FC0, is arbitrarily assumed to be an addition (+) would then be designated as
a deletion (-). By collecting suppressors of suppressors and even suppressors of suppressors of
suppressors one would expect that, in double mutants, any (+) would suppress any (-) and vice
versa. More correctly, this should be the case only so long as the shifted reading frame (the
region between the two mutations) specifies an amino acid sequence that allows at least partial
function of the protein. This condition was satisfied by working within only a short region of the
phage rIIB gene which apparently tolerates a variety of amino acid sequences without entirely
abolishing the rIIB function.
Note that a (+) followed by a (-) is not equivalent to a (-) followed by a (+), as the sequence of
codons in the region between the mutations is different in the two cases.
Experiments of this type showed that the code is indeed read from a fixed point that defines a
reading frame. This leaves the question of the "coding ratio" i.e. the size of the codons.
In order to address the question of word size, Crick et al. constructed triple mutants, all of the
same sign, either (---) or (+++). Whereas double mutants of the same sign are mutant, such triple
mutants were often pseudo-wild type. From this it could be concluded that the code is read in
multiples of three base pairs. Under the reasonable assumption that acridine and other frameshift mutagens (such as hydrazine) act by adding and deleting only single base pairs rather than
two or more at a time, it was concluded that the codon size is three.
Cracking the Code
Heinrich Matthaei (l) 1929Marshall Nirenberg (r) 1927-2010
The assignment of individual triplets to specific amino acids and to signals for
peptide chain initiation and termination began in 1961, the same year in which
Crick, Barnett, Brenner and Watts-Tobin established the general nature of the
code. Whereas the latter was discovered by purely genetic experiments, the
actual cracking of the code was accomplished by biochemical studies in cellfree systems. The first codon assignments were made by Marshall Nirenberg
and Heinrich Matthaei at the NIH who discovered that specific polypeptides
were produced when certain synthetic polyribonucleotides were added to a
protein-synthesizing system containing ribosomes, a supernatant fraction of E.
coli extract that contained tRNAs, activating enzymes, amino acids (one of
which is labeled with radioactive carbon), and an energy source. In the first
-6-
such experiments (Nirenberg and Matthaei 1961, PNAS 47: 15881602), the addition of polyuridylic acid (poly U) caused the production
of polyphenylalanine and the addition of polycytidylic acid (poly C)
caused the production of polyproline, indicating that UUU codes for
phenylalanine and CCC for proline. Later work (Matthaei et al. 1962)
included the use of random co-polymers of pairs of ribonucleotides. By
adjusting the relative amounts of the two nucleotides in the synthetic
polynucleotide and observing the identity and relative proportions of
the amino acids incorporated into polypeptides, it was possible to
Severo Ochoa
Har Gobind Khorana
deduce the composition (but not the order) of RNA triplets coding for
1905-1993
1922-2011
specific amino acids. This kind of work, with random
polyribonucleotides and, later, by Har Gobind Khorana, and by Severo
Ochoa with synthetic polynucleotides having known repeating sequences
(Khorana 1968. Nobel Lecture at
http://www.nobel.se/medicine/laureates/1968/; Ochoa 1968 Nobel Lecture at
http://www.nobelprize.org/nobel_prizes/medicine/laureates/1959/ochoalecture.html) led to the identification of the amino acid specificity of
approximately half of the codons. The repeating polyribonucleotide
CUCUCUCUCUCUCUC... , for example, gave rise to the repeating
Philip Leder
polypeptide leu-ser-leu-ser-leu... A further advance was the discovery by
1934Nirenberg and Philip Leder (1964) that RNA polynucleotides, even
trinucleotides, mediate specific binding of charged tRNA to ribosomes. By these various means,
all of the amino-acid specifying codons could be determined. With the further identification of
the three nonsense codons (primarily by genetic rather than biochemical methods), the code was
entirely cracked by 1967.
Universality
The Standard Code, shown in the above table, is almost universal. E. coli and vertebrate both
use it. The few departures that are known are mostly in mitochondria. AGA and AGG, for
example, code for arginine in the Standard Code, for serine in the mitochondria of invertebrate
-7-
metazoans and for STOP in the mitochondria of vertebrates. In tetrahymena and other ciliates,
UAA and UAG code for glutamine, leaving only UAA as a stop codon.
Error minimization and evolution of the code
The structure of the code embodies features that provide protection against the effects of
mutations and translation errors. Translation errors are more frequent in reading the first position
than in reading the second position of codons. The effect of such errors is reduced by the fact
that most codon changes that differ only in the first position code for amino acids that are
chemically similar with respect to hydrophobicity. And, conspicuously, mutations affecting only
the third position of a codon may not change the amino acid at all, as is seen in the fact that all
four amino acids are the same in 8 of the 16 boxes of the code table. The fact that most thirdposition changes are "synonymous" has suggested that the first codes, although read in triplets,
were doublet codes and therefore not able to specify all 20 amino acids of modern proteins.
Origin and evolution of the code
How did the code come to be what it is? Initially, it was thought that code assignments could not
undergo evolutionary change. Clearly, any mutation that causes a frequently used codon to
specify a different amino acid would alter the sequence of all proteins in whose messenger RNA
that codon is present. Such a massive effect would be lethal, except perhaps in a very small
genome such as that of a mitochondrion, and therefore would preclude any possibility of
evolution of the code. On this view, the code is a frozen accident. But computer modeling
experiments indicate that of all possible codes the Standard Code is one of the better ones
(although by no means the best) in minimizing translation errors. In a study of about a million
randomly generated codes, only one was found to be better. But the universe of all possible
codes is vastly larger than a million and other codes would be even less subject to translational
error. Just how the code arose and evolved, is currently a subject of active study. See Crick
(1968) The Origin of the Genetic Code J Mol Biol 38: 367-379 for an early view and Koonin &
Novozhilov (2009) Origin and Evolution of the Genetic Code: The Universal Enigma. IUBMB
Life 61: 99-111 for a more recent review.
-8-