Download gelfand-singapore

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Butyric acid wikipedia , lookup

Protein wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Metalloprotein wikipedia , lookup

Molecular ecology wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Proteolysis wikipedia , lookup

Peptide synthesis wikipedia , lookup

Metabolism wikipedia , lookup

Hepoxilin wikipedia , lookup

Protein structure prediction wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Point mutation wikipedia , lookup

Biosynthesis wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
You are given the data available to the researchers of the genetic code by early 1960’s, just before an
experimental procedure for direct analysis of amino acids encoded by specific codons (nucleotide
triplets) has been developed. These data are slightly idealized; in particular, numerical data are
converted into direct statements, unreliable results are removed, some data obtained later and used for
verification of the genetic code is added. Still, the situation closely resembles the one in which the first
attempts to decipher the genetic code were undertaken.
The data comes in four types. Firstly, there are sequences of polypeptides encoded by nucleotide
sequences of the known regular structure. Secondly, there are data about the composition of
polypeptides encoded by irregular nucleotide sequences of the known nucleotide composition (note
that these data are incomplete: that is, the presence of an amino acid in the resulting peptide indicates
that some codons in the nucleotide sequence correspond to this amino acid, but the converse is not
true: the amino acid may be absent for technical reasons). Thirdly, there are the results of mutations in
genes caused by nitrous acid: such mutations may change A to G and C to U, leading to changes in
codons and subsequent substitutions of amino acids in known proteins. Fourthly, there are results of
spontaneous (random) mutations (these data can be used to check the consistency of the genetic code
table).
Your aim is to reconstruct the genetic code table, that is to discover the correspondence between
codons and amino acids. The given data is not sufficient to reconstruct the genetic code completely,
but it you should try to deduce as many codon readings as possible.
The data:
1a. The following regular polynucleotide sequences produce regular polypeptides
1
2
3
4
5
6
7
8
9
short
notation
polyU
polyA
polyC
polyUC
polyUG
polyAC
polyAG
polyUUAC
polyUAUC
nucleotide sequence
amino acid sequence(s)
…UUUUUUUUUUUUU…
…AAAAAAAAAAAAA…
…CCCCCCCCCCCCC…
…UCUCUCUCUCUCU…
…UGUGUGUGUGUGU…
…ACACACACACACA…
…AGAGAGAGAGAGA…
…UUACUUACUUACU…
…UAUCUAUCUAUCU…
…-Phe-Phe-Phe-Phe-Phe-Phe-Phe-Phe-Phe-…
…-Lys-Lys-Lys-Lys-Lys–Lys-Lys–Lys-Lys-…
…-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-Pro-…
…-Leu-Ser-Leu-Ser-Leu-Ser-Leu-Ser-Leu-…
…-Val-Cys-Val-Cys-Val-Cys-Val-Cys-Val-…
…-Thr-His-Thr-His-Thr-His-Thr-His-Thr-…
…-Arg-Glu-Arg-Glu-Arg-Glu-Arg-Glu-Arg-…
…-Leu-Leu-Thr-Tyr-Leu-Leu-Thr-Tyr-Leu-…
…-Tyr-Leu-Ser-Ile-Tyr-Leu-Ser-Ile-Tyr-…
1b. The following regular polynucleotide sequences produce a mixture of two or three different regular
polypeptides
10 polyAAG
11 polyUAC
12 polyGUA
13 polyAUC
…AAGAAGAAGAAGA… …-Arg-Arg-Arg-Arg-Arg-Arg-Arg-Arg-Arg-…
…-Lys-Lys-Lys-Lys-Lys-Lys-Lys-Lys-Lys-…
…-Glu-Glu-Glu-Glu-Glu-Glu-Glu-Glu-Glu-…
…UACUACUACUACU… …-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-…
…-Thr-Thr-Thr-Thr-Thr-Thr-Thr-Thr-Thr-…
…-Tyr-Tyr-Tyr-Tyr-Tyr-Tyr-Tyr-Tyr-Tyr-…
…GUAGUAGUAGUAG… …-Val-Val-Val-Val-Val-Val-Val-Val-Val-…
…-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-…
…AUCAUCAUCAUCA… …-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-…
…-Ile-Ile-Ile-Ile-Ile-Ile-Ile-Ile-Ile-…
…-His-His-His-His-His-His-His-His-His-…
14 polyGAU
…GAUGAUGAUGAUG… …-Asp-Asp-Asp-Asp-Asp-Asp-Asp-Asp-Asp-…
…-Met-Met-Met-Met-Met-Met-Met-Met-Met-…
…UUGUUGUUGUUGU… …-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-…
…-Val-Val-Val-Val-Val-Val-Val-Val-Val-…
…-Cys-Cys-Cys-Cys-Cys-Cys-Cys-Cys-Cys-…
…CAACAACAACAAC… …-Thr-Thr-Thr-Thr-Thr-Thr-Thr-Thr-Thr-…
…-Asn-Asn-Asn-Asn-Asn-Asn-Asn-Asn-Asn-…
…-Gln-Gln-Gln-Gln-Gln-Gln-Gln-Gln-Gln-…
…UUCUUCUUCUUCU… …-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-Ser-…
…-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-Leu-…
…-Phe-Phe-Phe-Phe-Phe-Phe-Phe-Phe-Phe-…
15 polyUUG
16 polyCAA
17 polyUUC
1c. There are also three regular polynucleotide sequences that do not produce long peptides (at most
tripeptides are produced, that is chains of three amino acids):
polyGUAA
polyGAUA
18
19
…GUAAGUAAGUAAGUAAG…
…GAUAGAUAGAUAGAUAG…
2. The following irregular polinucleotides (the fraction of the main nucleotide is 80%) produce
polypeptides with the following composition:
1
2
3
4
5
6
main
nucl.
U
U
U
A
A
A
other
nucl.
C
A
G
U
G
C
main amino
acid
Phe
Phe
Phe
Lys
Lys
Lys
rare amino acid(s)
Ser,
Leu,
Cys,
Asn,
Arg,
Asn,
Leu
Ile, Tyr
Val, Leu
Ile
Glu
Gln, Thr
very rare
amino acid(s)
Pro
Asn
Gly, Trp
Leu, Tyr
Gly
His, Pro
3. Mutations in proteins caused by the AG and CU mutations in the Tobacco mosaic virus gene
for the envelope protein:
From
Ala
Asp
Glu
Ile
Lys
Met
Asn
Pro
Gln
Arg
Ser
Thr
Tyr
To
Val
Gly
Gly
Val,
Arg
Val
Ser
Leu,
Arg
Gly
Gly,
Ala,
Cys
Met
Ser
Leu, Phe
Met, Ile
Same table in a different format:
From
To
Thr
Tyr
Ser
Glu,
Thr
Pro,
Thr,
Lys,
Asn,
Ile,
Ala
Cys
Phe
Gly
Ile
Leu
Met
Arg
Ser
Val
Arg, Asp, Ser
Ser
Ile
Gln
Pro
Met, Ala
4. Combined data on spontaneous mutations in various proteins (tryptophanyl synthetase of
Escherichia coli and human hemoglobins)
From
Ala
Cys
Asp
Glu
Phe
Gly
His
Ile
Lys
Leu
Asn
Pro
Gln
Arg
Ser
Thr
Val
Tyr
To
Asp,
Gly
Gly,
Gln,
Leu
Val,
Tyr,
Thr,
Glu,
Arg,
Lys,
Gln
Glu,
Ile,
Arg,
Ile,
Ala,
Cys
Val, Glu
Ala, Asn
Val, Stop, Gly, Ala, Asp, Lys
Glu,
Arg,
Ser,
Asn,
Phe
Ser
Arg
Gly,
Leu,
Lys,
Gly,
Arg, Asp, Cys
Asp, Asn
Asn
Gln
Thr,
Phe,
Asn,
Asp,
Ser
Thr
Ser
Glu
There are two convenient formats for the gene code table, any of which may be used. I have already
filled three obvious cells.
amino acid
Alanine
Cysteine
Aspartate (aspartic acid)
Glutamate (glutamic acd)
Phenylalanine
Glycine
Histidine
Isoleucine
Lysine
Leucine
Methionine
Asparagine
three-letter
notation
Ala
Cys
Asp
Glu
Phe
Gly
His
Ile
Lys
Leu
Met
Asn
one-letter
notation
A
C
D
E
F
G
H
I
K
L
M
N
codon(s)
UUU
AAA
Pro
Gln
Arg
Ser
Thr
Val
Trp
Tyr
Proline
Glutamine
Arginine
Serine
Threonine
Valine
Tryptophan
Tyrosine
UUU
UUC
UUA
UUG
CUU
CUC
CUA
CUG
AUU
AUC
AUA
AUG
GUU
GUC
GUA
GUG
Phe
UCU
UCC
UCA
UCG
CCU
CCC
CCA
CCG
ACU
ACC
ACG
ACA
GCU
GCC
GCA
GCG
Pro
P
Q
R
S
T
V
W
Y
UAU
UAC
UAA
UAG
CAU
CAC
CAA
CAG
AAU
AAC
AAA
AAG
GAU
GAC
GAA
GAG
CCC
Lys
UGU
UGC
UGA
UGG
CGU
CGC
CGA
CGG
AGU
AGC
AGA
AGG
GGU
GGC
GGA
GGG