* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download gelfand-singapore
Survey
Document related concepts
Butyric acid wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Metalloprotein wikipedia , lookup
Molecular ecology wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Proteolysis wikipedia , lookup
Peptide synthesis wikipedia , lookup
Protein structure prediction wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biochemistry wikipedia , lookup
Point mutation wikipedia , lookup
Transcript
You are given the data available to the researchers of the genetic code by early 1960’s, just before an experimental procedure for direct analysis of amino acids encoded by specific codons (nucleotide triplets) has been developed. These data are slightly idealized; in particular, numerical data are converted into direct statements, unreliable results are removed, some data obtained later and used for verification of the genetic code is added. Still, the situation closely resembles the one in which the first attempts to decipher the genetic code were undertaken. The data comes in four types. Firstly, there are sequences of polypeptides encoded by nucleotide sequences of the known regular structure. Secondly, there are data about the composition of polypeptides encoded by irregular nucleotide sequences of the known nucleotide composition (note that these data are incomplete: that is, the presence of an amino acid in the resulting peptide indicates that some codons in the nucleotide sequence correspond to this amino acid, but the converse is not true: the amino acid may be absent for technical reasons). Thirdly, there are the results of mutations in genes caused by nitrous acid: such mutations may change A to G and C to U, leading to changes in codons and subsequent substitutions of amino acids in known proteins. Fourthly, there are results of spontaneous (random) mutations (these data can be used to check the consistency of the genetic code table). Your aim is to reconstruct the genetic code table, that is to discover the correspondence between codons and amino acids. The given data is not sufficient to reconstruct the genetic code completely, but it you should try to deduce as many codon readings as possible. The data: 1a. The following regular polynucleotide sequences produce regular polypeptides 1 2 3 4 5 6 7 8 9 short notation polyU polyA polyC polyUC polyUG polyAC polyAG polyUUAC polyUAUC nucleotide sequence amino acid sequence(s) …UUUUUUUUUUUUU… …AAAAAAAAAAAAA… …CCCCCCCCCCCCC… …UCUCUCUCUCUCU… …UGUGUGUGUGUGU… …ACACACACACACA… …AGAGAGAGAGAGA… …UUACUUACUUACU… …UAUCUAUCUAUCU… …-Phe-Phe-Phe-Phe-Phe-Phe-Phe-Phe-Phe-… …-Lys-Lys-Lys-Lys-Lys–Lys-Lys–Lys-Lys-… …-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-… …-Leu-Ser-Leu-Ser-Leu-Ser-Leu-Ser-Leu-… …-Val-Cys-Val-Cys-Val-Cys-Val-Cys-Val-… …-Thr-His-Thr-His-Thr-His-Thr-His-Thr-… …-Arg-Glu-Arg-Glu-Arg-Glu-Arg-Glu-Arg-… …-Leu-Leu-Thr-Tyr-Leu-Leu-Thr-Tyr-Leu-… …-Tyr-Leu-Ser-Ile-Tyr-Leu-Ser-Ile-Tyr-… 1b. The following regular polynucleotide sequences produce a mixture of two or three different regular polypeptides 10 polyAAG 11 polyUAC 12 polyGUA 13 polyAUC …AAGAAGAAGAAGA… …-Arg-Arg-Arg-Arg-Arg-Arg-Arg-Arg-Arg-… …-Lys-Lys-Lys-Lys-Lys-Lys-Lys-Lys-Lys-… …-Glu-Glu-Glu-Glu-Glu-Glu-Glu-Glu-Glu-… …UACUACUACUACU… …-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-… …-Thr-Thr-Thr-Thr-Thr-Thr-Thr-Thr-Thr-… …-Tyr-Tyr-Tyr-Tyr-Tyr-Tyr-Tyr-Tyr-Tyr-… …GUAGUAGUAGUAG… …-Val-Val-Val-Val-Val-Val-Val-Val-Val-… …-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-… …AUCAUCAUCAUCA… …-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-… …-Ile-Ile-Ile-Ile-Ile-Ile-Ile-Ile-Ile-… …-His-His-His-His-His-His-His-His-His-… 14 polyGAU …GAUGAUGAUGAUG… …-Asp-Asp-Asp-Asp-Asp-Asp-Asp-Asp-Asp-… …-Met-Met-Met-Met-Met-Met-Met-Met-Met-… …UUGUUGUUGUUGU… …-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-… …-Val-Val-Val-Val-Val-Val-Val-Val-Val-… …-Cys-Cys-Cys-Cys-Cys-Cys-Cys-Cys-Cys-… …CAACAACAACAAC… …-Thr-Thr-Thr-Thr-Thr-Thr-Thr-Thr-Thr-… …-Asn-Asn-Asn-Asn-Asn-Asn-Asn-Asn-Asn-… …-Gln-Gln-Gln-Gln-Gln-Gln-Gln-Gln-Gln-… …UUCUUCUUCUUCU… …-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-… …-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-… …-Phe-Phe-Phe-Phe-Phe-Phe-Phe-Phe-Phe-… 15 polyUUG 16 polyCAA 17 polyUUC 1c. There are also three regular polynucleotide sequences that do not produce long peptides (at most tripeptides are produced, that is chains of three amino acids): polyGUAA polyGAUA 18 19 …GUAAGUAAGUAAGUAAG… …GAUAGAUAGAUAGAUAG… 2. The following irregular polinucleotides (the fraction of the main nucleotide is 80%) produce polypeptides with the following composition: 1 2 3 4 5 6 main nucl. U U U A A A other nucl. C A G U G C main amino acid Phe Phe Phe Lys Lys Lys rare amino acid(s) Ser, Leu, Cys, Asn, Arg, Asn, Leu Ile, Tyr Val, Leu Ile Glu Gln, Thr very rare amino acid(s) Pro Asn Gly, Trp Leu, Tyr Gly His, Pro 3. Mutations in proteins caused by the AG and CU mutations in the Tobacco mosaic virus gene for the envelope protein: From Ala Asp Glu Ile Lys Met Asn Pro Gln Arg Ser Thr Tyr To Val Gly Gly Val, Arg Val Ser Leu, Arg Gly Gly, Ala, Cys Met Ser Leu, Phe Met, Ile Same table in a different format: From To Thr Tyr Ser Glu, Thr Pro, Thr, Lys, Asn, Ile, Ala Cys Phe Gly Ile Leu Met Arg Ser Val Arg, Asp, Ser Ser Ile Gln Pro Met, Ala 4. Combined data on spontaneous mutations in various proteins (tryptophanyl synthetase of Escherichia coli and human hemoglobins) From Ala Cys Asp Glu Phe Gly His Ile Lys Leu Asn Pro Gln Arg Ser Thr Val Tyr To Asp, Gly Gly, Gln, Leu Val, Tyr, Thr, Glu, Arg, Lys, Gln Glu, Ile, Arg, Ile, Ala, Cys Val, Glu Ala, Asn Val, Stop, Gly, Ala, Asp, Lys Glu, Arg, Ser, Asn, Phe Ser Arg Gly, Leu, Lys, Gly, Arg, Asp, Cys Asp, Asn Asn Gln Thr, Phe, Asn, Asp, Ser Thr Ser Glu There are two convenient formats for the gene code table, any of which may be used. I have already filled three obvious cells. amino acid Alanine Cysteine Aspartate (aspartic acid) Glutamate (glutamic acd) Phenylalanine Glycine Histidine Isoleucine Lysine Leucine Methionine Asparagine three-letter notation Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn one-letter notation A C D E F G H I K L M N codon(s) UUU AAA Pro Gln Arg Ser Thr Val Trp Tyr Proline Glutamine Arginine Serine Threonine Valine Tryptophan Tyrosine UUU UUC UUA UUG CUU CUC CUA CUG AUU AUC AUA AUG GUU GUC GUA GUG Phe UCU UCC UCA UCG CCU CCC CCA CCG ACU ACC ACG ACA GCU GCC GCA GCG Pro P Q R S T V W Y UAU UAC UAA UAG CAU CAC CAA CAG AAU AAC AAA AAG GAU GAC GAA GAG CCC Lys UGU UGC UGA UGG CGU CGC CGA CGG AGU AGC AGA AGG GGU GGC GGA GGG