Download S1 Genetics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The genetic
code
The Genetic Code
• Each amino acid is specified by a
triplet of nucleotides, known as a
codon.
The Genetic Code
•
•
•
•
TTT
TTC
TTA
TTG
Phe
Phe
Leu
Leu
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
CAT
CAC
CAA
CAG
His
His
Gln
Gln
CGT
CGC
CGA
CGG
Arg
Arg
Arg
Arg
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
AGT
AGC
AGA
AGG
Ser
Ser
Arg
Arg
GAT
GAC
GAA
GAG
Asp
Asp
Glu
Glu
GGT
GGC
GGA
GGG
Gly
Gly
Gly
Gly
•
•
•
•
•
CTT
CTC
CTA
CTG
Leu
Leu
Leu
Leu
CCT
CCC
CCA
CCG
Pro
Pro
Pro
Pro
•
•
•
•
•
ATT
ATC
ATA
ATG
Ile
Ile
Ile
Met
ACT
ACC
ACA
ACG
Thr
Thr
Thr
Thr
•
•
•
•
•
GTT
GTC
GTA
GTG
Val
Val
Val
Val
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
The Genetic Code
•
•
•
•
TTT
TTC
TTA
TTG
Phe
Phe
Leu
Leu
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
CAT
CAC
CAA
CAG
His
His
Gln
Gln
CGT
CGC
CGA
CGG
Arg
Arg
Arg
Arg
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
AGT
AGC
AGA
AGG
Ser
Ser
Arg
Arg
•
•
•
•
•
CTT
CTC
CTA
CTG
Leu
Leu
Leu
Leu
CCT
CCC
CCA
CCG
Pro
Pro
Pro
Pro
•
•
•
•
•
ATT
ATC
ATA
ATG
Ile
Ile
Ile
Met
ACT
ACC
ACA
ACG
Thr
Thr
Thr
Thr
•
•
•
•
•
GTT
GTC
GTA
GTG
Val
Val
Val
Val
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
ATG Met
Single methionine
GAT Asp
GGT Gly
codon
acts as
GAC Asp
GGC Gly
GAA Glu initiator
GGA Gly
GAG Glu
GGG Gly
The Genetic Code
•
•
•
•
TTT
TTC
TTA
TTG
Phe
Phe
Leu
Leu
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
CAT
CAC
CAA
CAG
His
His
Gln
Gln
CGT
CGC
CGA
CGG
Arg
Arg
Arg
Arg
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
AGT
AGC
AGA
AGG
Ser
Ser
Arg
Arg
•
•
•
•
•
CTT
CTC
CTA
CTG
Leu
Leu
Leu
Leu
CCT
CCC
CCA
CCG
Pro
Pro
Pro
Pro
•
•
•
•
•
ATT
ATC
ATA
ATG
Ile
Ile
Ile
Met
ACT
ACC
ACA
ACG
Thr
Thr
Thr
Thr
TAA Och
TGA
TAGGGT Amb
Ala Umb
GAT Asp
Gly
•
•
•
•
•
GTT
GTC
GTA
GTG
Val
Val
Val
Val
GCT
GCC
GCA
GCG
Ala
Ala
Ala
GAC Asp
GAA Glu
GAG Glu
GGC Gly
GGA Gly
GGG Gly
Three nonsense codons
act as stop signals
The Genetic Code
•
•
•
•
TTT
TTC
TTA
TTG
Phe
Phe
Leu
Leu
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
•
•
•
•
•
CTT
CTC
CTA
CTG
Leu
Leu
Leu
Leu
CCT
CCC
CCA
CCG
Pro
Pro
Pro
Pro
•
•
•
•
•
ATT
ATC
ATA
ATG
Ile
Ile
Ile
Met
ACT
ACC
ACA
ACG
Thr
Thr
Thr
Thr
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
Some amino
acids
(e.g.
CAT His
CGT Arg
CAC leucine)
His
CGC Arghave
CAA Gln
CGA Arg
CAG Gln up
CGG
toArgsix
AAT Asn
AGT Ser
AAC Asn codons
AGC Ser
AAA Lys
AAG Lys
AGA Arg
AGG Arg
GAT
GAC
GAA
GAG
GGT
GGC
GGA
GGG
•
•
•
•
•
GTT
GTC
GTA
GTG
Val
Val
Val
Val
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
Asp
Asp
Glu
Glu
Gly
Gly
Gly
Gly
The Genetic Code
•
•
•
•
TTT
TTC
TTA
TTG
Phe
Phe
Leu
Leu
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
CAT
CAC
CAA
CAG
His
His
Gln
Gln
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
GAT
GAC
GAA
GAG
Asp
Asp
Glu
Glu
•
•
•
•
•
CTT
CTC
CTA
CTG
Leu
Leu
Leu
Leu
CCT
CCC
CCA
CCG
Pro
Pro
Pro
Pro
•
•
•
•
•
ATT
ATC
ATA
ATG
Ile
Ile
Ile
Met
ACT
ACC
ACA
ACG
Thr
Thr
Thr
Thr
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
Some amino
CGT
Arg (e.g.
acids
CGC Arg
CGA
Arg
proline)
CGG Arg
have
AGT
Ser four
AGC Ser
AGAcodons
Arg
AGG Arg
•
•
•
•
•
GTT
GTC
GTA
GTG
Val
Val
Val
Val
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
GGT
GGC
GGA
GGG
Gly
Gly
Gly
Gly
The Genetic Code
•
•
•
•
TTT
TTC
TTA
TTG
Phe
Phe
Leu
Leu
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
Some amino
• CTT Leu
CCT Pro
acids
(e.g.
• CTC
Leu
CCC Pro
• CTA Leu
CCA Pro
• CTG Leu
CCG Pro
glutamine)
• ATT Ile
ACT Thr
have
two ACC Thr
• ATC Ile
• ATA Ile
ACA Thr
codons
• ATG Met
ACG Thr
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
CAT
CAC
CAA
CAG
His
His
Gln
Gln
CGT
CGC
CGA
CGG
Arg
Arg
Arg
Arg
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
AGT
AGC
AGA
AGG
Ser
Ser
Arg
Arg
GAT
GAC
GAA
GAG
Asp
Asp
Glu
Glu
GGT
GGC
GGA
GGG
Gly
Gly
Gly
Gly
•
•
•
•
•
•
•
GTT
GTC
GTA
GTG
Val
Val
Val
Val
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
The Genetic Code
• TTT Phe
TCT Ser
Tryptophan
and
• TTC Phe
TCC Ser
• TTA Leu
TCA Ser
methionine
• TTG Leu
TCG Ser
• CTT codon
Leu
CCT Pro
have one
• CTC Leu
CCC Pro
• CTA Leu
CCA Pro
each
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
CAT
CAC
CAA
CAG
His
His
Gln
Gln
CGT
CGC
CGA
CGG
Arg
Arg
Arg
Arg
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
AGT
AGC
AGA
AGG
Ser
Ser
Arg
Arg
GAT
GAC
GAA
GAG
Asp
Asp
Glu
Glu
GGT
GGC
GGA
GGG
Gly
Gly
Gly
Gly
•
• CTG Leu
CCG Pro
•
•
•
•
•
ATT
ATC
ATA
ATG
Ile
Ile
Ile
Met
ACT
ACC
ACA
ACG
Thr
Thr
Thr
Thr
•
•
•
•
•
GTT
GTC
GTA
GTG
Val
Val
Val
Val
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
The Genetic Code
•
•
•
•
TTT
TTC
TTA
TTG
Phe
Phe
Leu
Leu
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
CAT
CAC
CAA
CAG
His
His
Gln
Gln
CGT
CGC
CGA
CGG
Arg
Arg
Arg
Arg
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
GAT
GAC
GAA
GAG
Asp
Asp
Glu
Glu
•
•
•
•
•
CTT
CTC
CTA
CTG
Leu
Leu
Leu
Leu
ACT Thr
• ATT Ile
• ATC Ile
ACC Thr
• ATA Ile
• ATG Met
ACA Thr
• GTT Val
ACG Thr
• GTC Val
• GTA Val
• GTG Val
CCT
CCC
CCA
CCG
Pro
Pro
Pro
Pro
•
ACT
ACC
ACA
ACG
Thr
Thr
Thr
Thr
•
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
The last
AGT
Ser
nucleotide
AGC Ser
AGA
in Arg
a codon
AGG Arg
GGT is
Glyoften
GGC Gly
irrelevant
GGA
Gly
GGG Gly
When the
last
• TTT Phe
• TTC Phe
nucleotide
• TTA Leu
• TTG Leu
does matter,
• CTT Leu
it is usually
• CTC Leu
• CTA Leu
only important
• CTG Leu
Ile
whether•• itATT
is
ATC Ile
• ATA Ile
a purine
or
• ATG Met
• GTT Val
pyrimidine
• GTC Val
• GTA Val
• GTG Val
The Genetic Code
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
CAT
CAC
CAA
CAG
His
His
Gln
Gln
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
GAT
GAC
GAA
GAG
Asp
Asp
Glu
Glu
•
CCT
CCC
CCA
CCG
Pro
Pro
Pro
Pro
•
ACT
ACC
ACA
ACG
Thr
Thr
Thr
Thr
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
CAT His
CGT Arg
CGC
Arg
CAC
His
CGA Arg
CGG
Arg
CAA
Gln
AGT Ser
CAG Gln
AGC Ser
AGA Arg
AGG Arg
•
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
GGT
GGC
GGA
GGG
Gly
Gly
Gly
Gly
The Genetic Code
A nucleotide consists of a
ribose sugar bonded to
phosphoric acid, with a
nitrogen base of either a
pyrimidine (cytosine or
thymine) or purine (adenine
or guanine) as a side chain. A
base called Uracil replaces all
thymine bases in mRNA.
The Genetic Code
•
•
•
•
TTT
TTC
TTA
TTG
Phe
Phe
Leu
Leu
TCT
TCC
TCA
TCG
Ser
Ser
Ser
Ser
ATT Ile
• CTT Leu
CCT Pro
ATC
Ile
• CTC Leu
CCC Pro
• CTA Leu
CCA Pro
ATA
Ile
• CTG Leu
CCG Pro
• ATT Ile
ACT Thr
ATG
Met
• ATC Ile
ACC Thr
TAT
TAC
TAA
TAG
Tyr
Tyr
Och
Amb
TGT
TGC
TGA
TGG
Cys
Cys
Umb
Trp
CAT
CAC
CAA
CAG
His
His
Gln
Gln
CGT
CGC
CGA
CGG
Arg
Arg
Arg
Arg
•
•
• ATA Ile
• ATG Met
ACA Thr
ACG Thr
•
•
•
•
•
GTT
GTC
GTA
GTG
Val
Val
Val
Val
GCT
GCC
GCA
GCG
Ala
Ala
Ala
Ala
Ser
WithAGT
methionine
AGC Ser
Arg
and AGA
tryptophan,
AGG Arg
GAT Asp
Gly
theGGTexact
base
GAC Asp
GGC Gly
GAA Glu
GGA
Gly
matters
AAT
AAC
AAA
AAG
Asn
Asn
Lys
Lys
GAG Glu
GGG Gly
The Genetic Code
Recommended supplementary reading
Chatty, readable account of how Crick and Brenner
solved the mystery of the genetic code. This is not
a textbook. It is Francis Crick’s autobiographical
answer to James Watson’s book The Double Helix,
which describes the search for the structure of
DNA, and in which Watson notes the dictionary
definition of a crick as “a pain in the neck”.
Crick, F. What Mad Pursuit? 1989
(James Cameron-Gifford Library Q143.C7, George Green Library QH506.CRI)
How was the code deciphered?
Most of the work
to show the general
form of the genetic
code was done by
Francis Crick and
Sidney Brenner.
Crick
Brenner
How was the code deciphered?
They started off with
George Gamow’s
arguments based on
simple school
arithmetic to show that
the code was probably
a triplet code.
Crick
Brenner
Why must the code be in triplets?
There are only four nucleotides, therefore a singlet
code (i.e. a code in which each nucleotide
specifies an amino acid) could only encode four
amino acids.
However, there are twenty amino acids found in
most proteins. Therefore, the code cannot be
singlet in nature.
Why must the code be in triplets?
G
G
A
T
C
If the code were doublet, then there
would be four possible
nucleotides in the first position
and four in the second. This gives:
A
G
A
T
C
T
G
A
T
C
4 4 = 42 = 16 codons
C
G
A
T
C
Still too few to encode 20 amino
acids.
Why must the code be in triplets?
If the code were triplet, then there would be four
possible nucleotides in the first position, four in
the second and four in the third. This gives:
4 4 4 = 43 = 64 permutations
This is too many to encode 20 amino acids but the
code could work if either some permutations are
not used or if more than one encodes each amino
acid (or both).
Why must the code be in triplets?
Type of code
Singlet
Doublet
Triplet
Quadruplet
Pentuplet
Number of permutations
41 = 4
42 = 16
43 = 64
44 = 256
45 = 1024
Only the triplet code really looks feasible
How was the code deciphered?
There are also different ways that the code
can be read:
• It can be punctuated or unpunctuated.
• If it is unpunctuated it can be overlapping or
non-overlapping.
An overlapping code
GTCACCCATGGAGGTATCT
1
2
3
4
Once the first codon is set
(e.g. GTC), the next one can
only be one of four (TCA,
TCG, TCT or TCC). This is a
disadvantage.
A non-overlapping unpunctuated code
GTCACCCATGGAGGTATCT
1
2
1
3
2
1
4
3
2
5
4
3
5
4
5
There are three ways to read this type of
code, referred to as “reading frames”.
This makes this type of code non-ideal.
A non-overlapping punctuated code
GTCACCCATGGAGGTATCT
1
2
3
4
5
Here, one nucleotide (A) is used as a punctuation
mark. This code has several advantages:
1. The reading frame is set by the punctuation.
2. Because only three nucleotides are used in
codons, the number of coding permutations
available is 33 = 27 amino acids
Is the code really overlapping?
GTCACCCATGGAGGTATCT
Once the first amino acid is set,
the next one can only be one of
four. Therefore, certain amino
acids could never be next to
each other.
1
2
3
4
This can be tested by experimentation
Is the code really overlapping?
• Francis Crick and Sidney Brenner did
“nearest neighbour” analysis on real
proteins.
• They found that any amino acid could be
next to any other one. Therefore, the code
cannot be overlapping.
Is the code punctuated?
• Francis Crick and Sidney Brenner went on to
analyse a particular type of mutant that is induced
by intercalating agents (e.g. acridine dye).
• Intercalating agents will insert themselves
between the base pairs of DNA. These can stretch
the base pairs apart during replication and cause
an extra nucleotide to be inserted or one to be left
out.
Is the code punctuated?
• They found a gene (the rII gene) that has
special properties. It can tolerate several
wrong codons in the early part of the
coding sequence and still make an active
protein as long as the later part of the
coding sequence is correct.
Is the code punctuated?
• The mutations caused by intercalating agents fall
into two classes, 1 and 2. Both cause a mutant
phenotype in the rII gene.
1
Mutant phenotype
2
Mutant phenotype
Is the code punctuated?
• Double mutants (two mutations in one gene) also
cause a mutant phenotype in the rII gene.
1 1
Mutant phenotype
2 2
Mutant phenotype
• When the double mutant has two different
kinds of mutation, they suppress each other and
you get a non-mutant phenotype in the rII gene.
2 1
Wild type
(non-mutant)
phenotype
• Remember that the mutations caused by
acridine dyes result from the loss or gain of
one nucleotide.
• They cause the reading frame to change and
are called frame-shift mutations.
• The fact that they can arise means that there
must be reading frames and that means that
the code in unpunctuated.
How does this work?
GTCACCCATGGAGGTATCT
1
2
3
4
5
Original code
GTCTACCCATGGAGGTATC
1
2
3
4
5
Code with frame shift mutation
All codons after the inserted nucleotide
are wrong (some may be stop codons).
Two wrongs can make a right
GTCACCCATGGAGGTATCT
1
2
3
4
5
Original code
GTCTACCATGGAGGTATCT
1
2
3
4
5
Code with different frame shift mutations
After second mutation, codons back in
original frame.
Is the code triplet?
• Crick and Brenner went on to show that three
frame shift mutations of the first type (insertion)
or three of the second type (deletion) in the rII
gene could also give a wild-type phenotype.
• This could only happen if the code was triplet.
If the code was quadruplet then you would have
to add or delete four nucleotides to reset the
reading frame.
Three wrongs can make a right
GTCACCCATGGAGGTATCT
1
2
3
4
5
Original code
GTCTACTCACATGGAGGTA
1
2
3
4
5
6
Code with three similar frame shift mutations
An extra codon is inserted and a few codons
are wrong, then all of the rest are OK.
How was the code “cracked”?
• We must first consider how genetic information is
used by the cell.
• In higher organisms (eukaryotes) the DNA is in
the nucleus and the protein is made in the
cytoplasm  there must be an intermediate.
• Messenger RNA (mRNA) moves from the
nucleus to the cytoplasm and carries the genetic
code.
How was the code “cracked”?
The Central Dogma
• Francis Crick proposed the idea that genetic
information moves in one direction and called
this the central dogma of molecular genetics.
replication
DNA
RNA
transcription
Protein
translation
How was the code “cracked”?
• Cells can be broken open and the elements
needed for protein synthesis can be isolated.
When RNA is added, the protein encoded
by that RNA is made.
• Artificial RNA can be made in the test tube
and added to this system.
How was the code “cracked”?
• Cells can be broken open and the elements
needed for protein synthesis can be isolated.
When RNA is added, the protein encoded by
that RNA is made.
• http://profiles.nlm.nih.gov/JJ/Views/Exhi
bit/documents/codeoflife.html
Nirenberg
• Artificial RNA can be made in the test tube and
added to this system. This work was done by
Marshall Nirenberg and Har Gobind Khorana
• http://www.ucs.mun.ca/~c64dcp/Khorana
.html
Khorana
How was the code “cracked”?
• Nirenberg made simple RNA with the sequence:
UUUUUUUUUUUUUUUUUUUUU
• When he put this into the test tube, he found that
the protein made was a string of one type of
amino acid, phenylalanine, joined together.
Therefore the codon UUU (or TTT in DNA),
encodes phenylalanine.
How was the code “cracked”?
Similarly, RNA with the sequence:
CCCCCCCCCCCCCCCCCCCCCC
encodes a protein that is all proline.
AAAAAAAAAAAAAAAAAAAA
encodes a protein that is all lysine.
How was the code “cracked”?
• Khorana made less simple RNA with the
sequence:
UGUGUGUGUGUGUGUGUGUGU
• When he put this into the test tube, he found
that the protein made was a string of two
alternating amino acids, valine and cysteine.
TGT = Cys
GTG = Val
How was the code “cracked”?
• By successively more sophisticated
experiments of this type, the amino acids
specified by most of the 61 amino acid
encoding triplets were identified.
• Final confirmation required experiments
with another type of RNA, transfer RNA
(tRNA), which is the subject of the next
lecture.
Related documents