Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
The genetic code The Genetic Code • Each amino acid is specified by a triplet of nucleotides, known as a codon. The Genetic Code • • • • TTT TTC TTA TTG Phe Phe Leu Leu TCT TCC TCA TCG Ser Ser Ser Ser TAT TAC TAA TAG Tyr Tyr Och Amb TGT TGC TGA TGG Cys Cys Umb Trp CAT CAC CAA CAG His His Gln Gln CGT CGC CGA CGG Arg Arg Arg Arg AAT AAC AAA AAG Asn Asn Lys Lys AGT AGC AGA AGG Ser Ser Arg Arg GAT GAC GAA GAG Asp Asp Glu Glu GGT GGC GGA GGG Gly Gly Gly Gly • • • • • CTT CTC CTA CTG Leu Leu Leu Leu CCT CCC CCA CCG Pro Pro Pro Pro • • • • • ATT ATC ATA ATG Ile Ile Ile Met ACT ACC ACA ACG Thr Thr Thr Thr • • • • • GTT GTC GTA GTG Val Val Val Val GCT GCC GCA GCG Ala Ala Ala Ala The Genetic Code • • • • TTT TTC TTA TTG Phe Phe Leu Leu TCT TCC TCA TCG Ser Ser Ser Ser TAT TAC TAA TAG Tyr Tyr Och Amb TGT TGC TGA TGG Cys Cys Umb Trp CAT CAC CAA CAG His His Gln Gln CGT CGC CGA CGG Arg Arg Arg Arg AAT AAC AAA AAG Asn Asn Lys Lys AGT AGC AGA AGG Ser Ser Arg Arg • • • • • CTT CTC CTA CTG Leu Leu Leu Leu CCT CCC CCA CCG Pro Pro Pro Pro • • • • • ATT ATC ATA ATG Ile Ile Ile Met ACT ACC ACA ACG Thr Thr Thr Thr • • • • • GTT GTC GTA GTG Val Val Val Val GCT GCC GCA GCG Ala Ala Ala Ala ATG Met Single methionine GAT Asp GGT Gly codon acts as GAC Asp GGC Gly GAA Glu initiator GGA Gly GAG Glu GGG Gly The Genetic Code • • • • TTT TTC TTA TTG Phe Phe Leu Leu TCT TCC TCA TCG Ser Ser Ser Ser TAT TAC TAA TAG Tyr Tyr Och Amb TGT TGC TGA TGG Cys Cys Umb Trp CAT CAC CAA CAG His His Gln Gln CGT CGC CGA CGG Arg Arg Arg Arg AAT AAC AAA AAG Asn Asn Lys Lys AGT AGC AGA AGG Ser Ser Arg Arg • • • • • CTT CTC CTA CTG Leu Leu Leu Leu CCT CCC CCA CCG Pro Pro Pro Pro • • • • • ATT ATC ATA ATG Ile Ile Ile Met ACT ACC ACA ACG Thr Thr Thr Thr TAA Och TGA TAGGGT Amb Ala Umb GAT Asp Gly • • • • • GTT GTC GTA GTG Val Val Val Val GCT GCC GCA GCG Ala Ala Ala GAC Asp GAA Glu GAG Glu GGC Gly GGA Gly GGG Gly Three nonsense codons act as stop signals The Genetic Code • • • • TTT TTC TTA TTG Phe Phe Leu Leu TCT TCC TCA TCG Ser Ser Ser Ser • • • • • CTT CTC CTA CTG Leu Leu Leu Leu CCT CCC CCA CCG Pro Pro Pro Pro • • • • • ATT ATC ATA ATG Ile Ile Ile Met ACT ACC ACA ACG Thr Thr Thr Thr TAT TAC TAA TAG Tyr Tyr Och Amb TGT TGC TGA TGG Cys Cys Umb Trp Some amino acids (e.g. CAT His CGT Arg CAC leucine) His CGC Arghave CAA Gln CGA Arg CAG Gln up CGG toArgsix AAT Asn AGT Ser AAC Asn codons AGC Ser AAA Lys AAG Lys AGA Arg AGG Arg GAT GAC GAA GAG GGT GGC GGA GGG • • • • • GTT GTC GTA GTG Val Val Val Val GCT GCC GCA GCG Ala Ala Ala Ala Asp Asp Glu Glu Gly Gly Gly Gly The Genetic Code • • • • TTT TTC TTA TTG Phe Phe Leu Leu TCT TCC TCA TCG Ser Ser Ser Ser TAT TAC TAA TAG Tyr Tyr Och Amb CAT CAC CAA CAG His His Gln Gln AAT AAC AAA AAG Asn Asn Lys Lys GAT GAC GAA GAG Asp Asp Glu Glu • • • • • CTT CTC CTA CTG Leu Leu Leu Leu CCT CCC CCA CCG Pro Pro Pro Pro • • • • • ATT ATC ATA ATG Ile Ile Ile Met ACT ACC ACA ACG Thr Thr Thr Thr TGT TGC TGA TGG Cys Cys Umb Trp Some amino CGT Arg (e.g. acids CGC Arg CGA Arg proline) CGG Arg have AGT Ser four AGC Ser AGAcodons Arg AGG Arg • • • • • GTT GTC GTA GTG Val Val Val Val GCT GCC GCA GCG Ala Ala Ala Ala GGT GGC GGA GGG Gly Gly Gly Gly The Genetic Code • • • • TTT TTC TTA TTG Phe Phe Leu Leu TCT TCC TCA TCG Ser Ser Ser Ser Some amino • CTT Leu CCT Pro acids (e.g. • CTC Leu CCC Pro • CTA Leu CCA Pro • CTG Leu CCG Pro glutamine) • ATT Ile ACT Thr have two ACC Thr • ATC Ile • ATA Ile ACA Thr codons • ATG Met ACG Thr TAT TAC TAA TAG Tyr Tyr Och Amb TGT TGC TGA TGG Cys Cys Umb Trp CAT CAC CAA CAG His His Gln Gln CGT CGC CGA CGG Arg Arg Arg Arg AAT AAC AAA AAG Asn Asn Lys Lys AGT AGC AGA AGG Ser Ser Arg Arg GAT GAC GAA GAG Asp Asp Glu Glu GGT GGC GGA GGG Gly Gly Gly Gly • • • • • • • GTT GTC GTA GTG Val Val Val Val GCT GCC GCA GCG Ala Ala Ala Ala The Genetic Code • TTT Phe TCT Ser Tryptophan and • TTC Phe TCC Ser • TTA Leu TCA Ser methionine • TTG Leu TCG Ser • CTT codon Leu CCT Pro have one • CTC Leu CCC Pro • CTA Leu CCA Pro each TAT TAC TAA TAG Tyr Tyr Och Amb TGT TGC TGA TGG Cys Cys Umb Trp CAT CAC CAA CAG His His Gln Gln CGT CGC CGA CGG Arg Arg Arg Arg AAT AAC AAA AAG Asn Asn Lys Lys AGT AGC AGA AGG Ser Ser Arg Arg GAT GAC GAA GAG Asp Asp Glu Glu GGT GGC GGA GGG Gly Gly Gly Gly • • CTG Leu CCG Pro • • • • • ATT ATC ATA ATG Ile Ile Ile Met ACT ACC ACA ACG Thr Thr Thr Thr • • • • • GTT GTC GTA GTG Val Val Val Val GCT GCC GCA GCG Ala Ala Ala Ala The Genetic Code • • • • TTT TTC TTA TTG Phe Phe Leu Leu TCT TCC TCA TCG Ser Ser Ser Ser TAT TAC TAA TAG Tyr Tyr Och Amb TGT TGC TGA TGG Cys Cys Umb Trp CAT CAC CAA CAG His His Gln Gln CGT CGC CGA CGG Arg Arg Arg Arg AAT AAC AAA AAG Asn Asn Lys Lys GAT GAC GAA GAG Asp Asp Glu Glu • • • • • CTT CTC CTA CTG Leu Leu Leu Leu ACT Thr • ATT Ile • ATC Ile ACC Thr • ATA Ile • ATG Met ACA Thr • GTT Val ACG Thr • GTC Val • GTA Val • GTG Val CCT CCC CCA CCG Pro Pro Pro Pro • ACT ACC ACA ACG Thr Thr Thr Thr • GCT GCC GCA GCG Ala Ala Ala Ala The last AGT Ser nucleotide AGC Ser AGA in Arg a codon AGG Arg GGT is Glyoften GGC Gly irrelevant GGA Gly GGG Gly When the last • TTT Phe • TTC Phe nucleotide • TTA Leu • TTG Leu does matter, • CTT Leu it is usually • CTC Leu • CTA Leu only important • CTG Leu Ile whether•• itATT is ATC Ile • ATA Ile a purine or • ATG Met • GTT Val pyrimidine • GTC Val • GTA Val • GTG Val The Genetic Code TCT TCC TCA TCG Ser Ser Ser Ser TAT TAC TAA TAG Tyr Tyr Och Amb CAT CAC CAA CAG His His Gln Gln AAT AAC AAA AAG Asn Asn Lys Lys GAT GAC GAA GAG Asp Asp Glu Glu • CCT CCC CCA CCG Pro Pro Pro Pro • ACT ACC ACA ACG Thr Thr Thr Thr TGT TGC TGA TGG Cys Cys Umb Trp CAT His CGT Arg CGC Arg CAC His CGA Arg CGG Arg CAA Gln AGT Ser CAG Gln AGC Ser AGA Arg AGG Arg • GCT GCC GCA GCG Ala Ala Ala Ala GGT GGC GGA GGG Gly Gly Gly Gly The Genetic Code A nucleotide consists of a ribose sugar bonded to phosphoric acid, with a nitrogen base of either a pyrimidine (cytosine or thymine) or purine (adenine or guanine) as a side chain. A base called Uracil replaces all thymine bases in mRNA. The Genetic Code • • • • TTT TTC TTA TTG Phe Phe Leu Leu TCT TCC TCA TCG Ser Ser Ser Ser ATT Ile • CTT Leu CCT Pro ATC Ile • CTC Leu CCC Pro • CTA Leu CCA Pro ATA Ile • CTG Leu CCG Pro • ATT Ile ACT Thr ATG Met • ATC Ile ACC Thr TAT TAC TAA TAG Tyr Tyr Och Amb TGT TGC TGA TGG Cys Cys Umb Trp CAT CAC CAA CAG His His Gln Gln CGT CGC CGA CGG Arg Arg Arg Arg • • • ATA Ile • ATG Met ACA Thr ACG Thr • • • • • GTT GTC GTA GTG Val Val Val Val GCT GCC GCA GCG Ala Ala Ala Ala Ser WithAGT methionine AGC Ser Arg and AGA tryptophan, AGG Arg GAT Asp Gly theGGTexact base GAC Asp GGC Gly GAA Glu GGA Gly matters AAT AAC AAA AAG Asn Asn Lys Lys GAG Glu GGG Gly The Genetic Code Recommended supplementary reading Chatty, readable account of how Crick and Brenner solved the mystery of the genetic code. This is not a textbook. It is Francis Crick’s autobiographical answer to James Watson’s book The Double Helix, which describes the search for the structure of DNA, and in which Watson notes the dictionary definition of a crick as “a pain in the neck”. Crick, F. What Mad Pursuit? 1989 (James Cameron-Gifford Library Q143.C7, George Green Library QH506.CRI) How was the code deciphered? Most of the work to show the general form of the genetic code was done by Francis Crick and Sidney Brenner. Crick Brenner How was the code deciphered? They started off with George Gamow’s arguments based on simple school arithmetic to show that the code was probably a triplet code. Crick Brenner Why must the code be in triplets? There are only four nucleotides, therefore a singlet code (i.e. a code in which each nucleotide specifies an amino acid) could only encode four amino acids. However, there are twenty amino acids found in most proteins. Therefore, the code cannot be singlet in nature. Why must the code be in triplets? G G A T C If the code were doublet, then there would be four possible nucleotides in the first position and four in the second. This gives: A G A T C T G A T C 4 4 = 42 = 16 codons C G A T C Still too few to encode 20 amino acids. Why must the code be in triplets? If the code were triplet, then there would be four possible nucleotides in the first position, four in the second and four in the third. This gives: 4 4 4 = 43 = 64 permutations This is too many to encode 20 amino acids but the code could work if either some permutations are not used or if more than one encodes each amino acid (or both). Why must the code be in triplets? Type of code Singlet Doublet Triplet Quadruplet Pentuplet Number of permutations 41 = 4 42 = 16 43 = 64 44 = 256 45 = 1024 Only the triplet code really looks feasible How was the code deciphered? There are also different ways that the code can be read: • It can be punctuated or unpunctuated. • If it is unpunctuated it can be overlapping or non-overlapping. An overlapping code GTCACCCATGGAGGTATCT 1 2 3 4 Once the first codon is set (e.g. GTC), the next one can only be one of four (TCA, TCG, TCT or TCC). This is a disadvantage. A non-overlapping unpunctuated code GTCACCCATGGAGGTATCT 1 2 1 3 2 1 4 3 2 5 4 3 5 4 5 There are three ways to read this type of code, referred to as “reading frames”. This makes this type of code non-ideal. A non-overlapping punctuated code GTCACCCATGGAGGTATCT 1 2 3 4 5 Here, one nucleotide (A) is used as a punctuation mark. This code has several advantages: 1. The reading frame is set by the punctuation. 2. Because only three nucleotides are used in codons, the number of coding permutations available is 33 = 27 amino acids Is the code really overlapping? GTCACCCATGGAGGTATCT Once the first amino acid is set, the next one can only be one of four. Therefore, certain amino acids could never be next to each other. 1 2 3 4 This can be tested by experimentation Is the code really overlapping? • Francis Crick and Sidney Brenner did “nearest neighbour” analysis on real proteins. • They found that any amino acid could be next to any other one. Therefore, the code cannot be overlapping. Is the code punctuated? • Francis Crick and Sidney Brenner went on to analyse a particular type of mutant that is induced by intercalating agents (e.g. acridine dye). • Intercalating agents will insert themselves between the base pairs of DNA. These can stretch the base pairs apart during replication and cause an extra nucleotide to be inserted or one to be left out. Is the code punctuated? • They found a gene (the rII gene) that has special properties. It can tolerate several wrong codons in the early part of the coding sequence and still make an active protein as long as the later part of the coding sequence is correct. Is the code punctuated? • The mutations caused by intercalating agents fall into two classes, 1 and 2. Both cause a mutant phenotype in the rII gene. 1 Mutant phenotype 2 Mutant phenotype Is the code punctuated? • Double mutants (two mutations in one gene) also cause a mutant phenotype in the rII gene. 1 1 Mutant phenotype 2 2 Mutant phenotype • When the double mutant has two different kinds of mutation, they suppress each other and you get a non-mutant phenotype in the rII gene. 2 1 Wild type (non-mutant) phenotype • Remember that the mutations caused by acridine dyes result from the loss or gain of one nucleotide. • They cause the reading frame to change and are called frame-shift mutations. • The fact that they can arise means that there must be reading frames and that means that the code in unpunctuated. How does this work? GTCACCCATGGAGGTATCT 1 2 3 4 5 Original code GTCTACCCATGGAGGTATC 1 2 3 4 5 Code with frame shift mutation All codons after the inserted nucleotide are wrong (some may be stop codons). Two wrongs can make a right GTCACCCATGGAGGTATCT 1 2 3 4 5 Original code GTCTACCATGGAGGTATCT 1 2 3 4 5 Code with different frame shift mutations After second mutation, codons back in original frame. Is the code triplet? • Crick and Brenner went on to show that three frame shift mutations of the first type (insertion) or three of the second type (deletion) in the rII gene could also give a wild-type phenotype. • This could only happen if the code was triplet. If the code was quadruplet then you would have to add or delete four nucleotides to reset the reading frame. Three wrongs can make a right GTCACCCATGGAGGTATCT 1 2 3 4 5 Original code GTCTACTCACATGGAGGTA 1 2 3 4 5 6 Code with three similar frame shift mutations An extra codon is inserted and a few codons are wrong, then all of the rest are OK. How was the code “cracked”? • We must first consider how genetic information is used by the cell. • In higher organisms (eukaryotes) the DNA is in the nucleus and the protein is made in the cytoplasm there must be an intermediate. • Messenger RNA (mRNA) moves from the nucleus to the cytoplasm and carries the genetic code. How was the code “cracked”? The Central Dogma • Francis Crick proposed the idea that genetic information moves in one direction and called this the central dogma of molecular genetics. replication DNA RNA transcription Protein translation How was the code “cracked”? • Cells can be broken open and the elements needed for protein synthesis can be isolated. When RNA is added, the protein encoded by that RNA is made. • Artificial RNA can be made in the test tube and added to this system. How was the code “cracked”? • Cells can be broken open and the elements needed for protein synthesis can be isolated. When RNA is added, the protein encoded by that RNA is made. • http://profiles.nlm.nih.gov/JJ/Views/Exhi bit/documents/codeoflife.html Nirenberg • Artificial RNA can be made in the test tube and added to this system. This work was done by Marshall Nirenberg and Har Gobind Khorana • http://www.ucs.mun.ca/~c64dcp/Khorana .html Khorana How was the code “cracked”? • Nirenberg made simple RNA with the sequence: UUUUUUUUUUUUUUUUUUUUU • When he put this into the test tube, he found that the protein made was a string of one type of amino acid, phenylalanine, joined together. Therefore the codon UUU (or TTT in DNA), encodes phenylalanine. How was the code “cracked”? Similarly, RNA with the sequence: CCCCCCCCCCCCCCCCCCCCCC encodes a protein that is all proline. AAAAAAAAAAAAAAAAAAAA encodes a protein that is all lysine. How was the code “cracked”? • Khorana made less simple RNA with the sequence: UGUGUGUGUGUGUGUGUGUGU • When he put this into the test tube, he found that the protein made was a string of two alternating amino acids, valine and cysteine. TGT = Cys GTG = Val How was the code “cracked”? • By successively more sophisticated experiments of this type, the amino acids specified by most of the 61 amino acid encoding triplets were identified. • Final confirmation required experiments with another type of RNA, transfer RNA (tRNA), which is the subject of the next lecture.