Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genetic engineering wikipedia , lookup
Personalized medicine wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene therapy wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Biosynthesis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
©1994 Oxford University Press Nucleic Acids Research, 1994, Vol. 22, No. 8 1327-1334 Mutations to nonsense codons in human genetic disease: implications for gene therapy by nonsense suppressor tRNAs Jennifer Atkinson and Robin Martin* Krebs Institute for Biomolecular Research, The University of Sheffield, PO Box 594, Firth Court, Western Bank, Sheffield S10 2UH, UK Received January 28, 1994; Revised and Accepted March 7, 1994 ABSTRACT Nonsense suppressor tRNAs have been suggested as potential agents for human somatic gene therapy. Recent work from this laboratory has described significant effects of 3' codon context on the efficiency of human nonsense suppressors. A rapid Increase in the number of reports of human diseases caused by nonsense codons, prompted us to determine how the spectrum of mutation to either UAG, UAA or UGA codons and their respective 3' contexts, might effect the efficiency of human suppressor tRNAs employed for purposes of gene therapy. This paper presents a survey of 179 events of mutations to nonsense codons which cause human germline or somatic disease. The analysis revealed a ratio of approximately 1:2:3 for mutation to UAA, UAG and UGA respectively. This pattern is similar, but not identical, to that of naturally occurring stop codons. The 3' contexts of new mutations to stop were also analysed. Once again, the pattern was similar to the contexts surrounding natural termination signals. These results Imply there will be little difference In the sensitivity of nonsense mutations and natural stop codons to suppression by nonsense suppressor tRNAs. Analysis of the codons altered by nonsense mutations suggests that efforts to design human UAG suppressor tRNAs charged with Trp, Gin, and Glu; UAA suppressors charged with Gin and Glu, and UGA suppressors which Insert Arg, would be an essential step In the development of suppressor tRNAs as agents of human somatic gene therapy. INTRODUCTION Nonsense mutations cause the premature termination of protein synthesis, since in the normal course of translation, there are no aminoacyl-tRNAs whose anticodons match the UAG, UAA or UGA nonsense codons. Nonsense suppressors can be created however, by mutating the tRNA so that the suppressor is able to match one of the termination signals. A proportion of full length gene product is now produced. In 1982, Y. W. Kan and *To whom correspondence should be addressed colleagues published a paper in Nature reporting the construction of a human nonsense suppressor tRNA and the successful in vitro suppression of a UAG mutation at codon 17 of the /S-globin gene (1). The mRNA containing the nonsense mutation was obtained from a patient suffering from |S0 thalassemia and it was suggested that nonsense suppression might one day prove to be a useful technique for the somatic gene therapy of human diseases caused by mutation to nonsense codons (1). Although there has been relatively little work in this area in the intervening years, there are several attractive aspects to such a strategy. First, tRNA genes have strong promoters, which are active in all cell types. The promoters for eukaryotic tRNA genes lie within the structural sequences encoding the tRNA molecule itself (2). Although there are elements which regulate transcriptional activity within the 5' upstream region (3), the length of an active transcriptional unit may be considerably less than 500 base pairs, and thus accommodation within a delivery vector presents no problem. Secondly, once they have been transcribed and processed, tRNAs have low rates of degradation. Finally, gene therapy with a nonsense suppressor would maintain the endogenous, physiological controls over the target gene which contains the nonsense codon. On the down side, nonsense suppressors may cause readthrough of natural stop codons. In addition, the presence of nonsense mutations can lead to the aberrant splicing of introns, and to reduced levels of complete mRNA (4,5). As these events are both nuclear in location, they are probably beyond the reach of cytoplasmic suppressors. Of course, only a fraction of mutations leading to human genetic disease are caused by nonsense mutations. However, if an effective mechanism for gene therapy by nonsense suppression could one day be developed, it would then be applicable to similar mutations in a wide range of genes. One aspect which was not considered in the in vitro experiments (1) was the context sensitivity of the efficiency of nonsense suppression. Recently, we have described the way in which the 3' codon context affects the efficiency of UAG suppressor tRNAs in human tissue culture cells (6,7). In general, the efficiency of suppression varies according to the immediate 3' base in the pattern: C > G > U > A , although it is probable that there are effects of the next 3' base as well. The efficiency of nonsense 1328 Nucleic Acids Research, 1994, Vol. 22, No. 8 Table 1. Nonsense mutations in human genes resulting in genetic disease. 5'codon Affected codon 3'codon Stop codon Site Gene or disease CTG(leu) AGG(arg) AGA(arg) CTGOeu) GCA(ala) AAAOys) AAGflys) GCT(ala) AAGOys) GGC(gly) TGC(cys) GTG(val) AGG(arg) TTC(phe) AAA(lys) ACT(thr) ACA(thr) CTGOeu) CTGOeu) GCC(ala) CTG(leu) ACC(thr) GGC(gly) GTC(val) AAAOys) TTT(phe) GTG(val) AGT(ser) TTC(phe) CTTOeu) CAA(gln) CAG(gln) ATA(Ue) TCT(ser) AGC(ser) ACA(thr) ATG(met) GCA(ala) GAG(glu) AAG(lys) GTC(ala) CCA(pro) TGGOeu) TAT(tyr) CTAOeu) GTC(val) AAC(asn) TTGOeu) AAT(asn) ACA(thr) ATT(Ue) GTA(val) TAT(tyr) GCT(ala) GCT(ala) CTTOeu) GTT(val) GCA(ala) TTT(phe) TTT(phe) AAG(lys) GAT(asp) TGT(cys) TGT(cys) TGG(trp) AGA(arg) TAT(tyr) ACC(thr) GGT(gly) CCT(pro) TTGOeu) CAA(gln) CAA(gln) CAG(gln) GAA(glu) CAA(gln) TGG(trp) GAA(glu) TTAOeu) CGA(arg) CGA(arg) AAGOys) CAG(gln) CGA(arg) CAG(gln) TCA(ser) CGA(arg) TAC(tyr) TAC(tyr) TGG(trp) TGG(trp) CAG(gln) AAGOys) TGT(cys) GAA(glu) GAG(glu) CAG(gln) GAG(glu) CAA(gln) GGA(gly) CGA(arg) TGG(trp) TGG(trp) CAG(gln) CGA(arg) TGG(trp) CGA(arg) TGC(cys) CGA(arg) GAA(glu) GAG(glu) TGG(trp) CGA(arg) TGG(trp) CGA(arg) CGA(arg) CGA(arg) CGA(arg) CAG(gln) CGA(arg) CGA(arg) CGA(arg) CGA(arg) CGA(arg) CGA(arg) CGA(arg) CAA(gln) CGA(arg) GAA(glu) TGG(trp) CAG(gln) CAG(gln) GAG(glu) TGG(trp) TGT(cys) TGC(cys) CGA(arg) CAA(gln) CAA(gln) TGG(trp) AGTOeu) AGT(ser) TCA(ser) GGT(gly) ATA(Ue) ATT(ile) GCC(ala) CTGOeu) GTA(val) ATC(ile) CTCOeu) GTG(val) GAG(glu) GACHglu) CAT(rus) TCA(ser) CTCOeu) GAG(glu) GAG(glu) GGC(gly) GCC(ala) AGG(arg) GTG(val) GTG(val) TTC(phe) TCC(ser) GCA/U(ala) CTGOeu) GAG(glu) GAA(glu) GCA(ala) AGG(arg) AAAOys) TTT(phe) GTC(val) AAC(asn) TCT(ser) CAA(gln) GAA(glu) CTTOeu) AA ACA(thr) TTC(phe) CAT(his) ATG(met) TTT(phe) AGC(ser) CACKgln) AGC(ser) CAC(his) TGG(trp) AAAOys) GGA(gly) TAC(tyr) CTTOeu) ATT(ile) GGG(gly) GAA(glu) AAC(asn) AAGflys) TAT(tyr) TGT(cys) TCC(ser) TGT(cys) CCC(pro) GAG(glu) CTTOeu) TCA(ser) TCC(phe) CACKgln) TAG TAA TAA TAG TAA TAA TGA TAA TAA TGA TGA TAG TAG TGA TAG TAA TGA TAG TAA TAG TGA TAG TAG TGA TAA TAG TAG TAG TAA TGA TGA TGA TAG TAG TGA TAG TGA TGA TGA TAA TAG TAG TGA TGA TGA TGA TGA TGA TAG TGA TGA TGA TGA TGA TGA TGA TAA TGA TAA TAG TAG TAG TAG TGA TGA TGA TGA TAA TAA TAG L261X Q1O41X Q1067X Q1338X E13O6X Q12X W717X E358X L14OX R197X R129X K217X Q84X R2486X Q145OX S375OX R19X Y37X Y37X W210X W98X Q39X K17X C112X E121X E43X Q127X E90X Q3O9X G542X R553X W1282X W1316X Q493X R1162X W846X R1158X C524X nt2510 nt3714 nt2522 nt6002 R-5X W225X R336X R427X R583X R795X Q1686X R1696X R1941X R1966X R2116X R2147X R2209X R2307X nt6406 nt6460 nt6472 nt6688 nt6693 ntlO4OO ntl0406 ntlO468 ntlO471 nt 17700 ntl7761 nt20497 nt20551 nt20561 Acid Spingomyelinase Adenomatous polyposis coli(APC) APC-gastric cancer APC APC AMP deaminase Androgen receptor Anti-mullerian, Hormone Antithrombinin AntithrombiruTI Antithrombinlll a 1 -antitrypsin(emphysema) Apolipoprotein A-l Apohpoprotein B Apolipoprotein B Apolipoprotein B Apolipoprotein C-n Apolipoprotein CII Apolipoprotein CD Apolipoprotein E APRT deficiency Beta-globin(/3-thalassemia) Beta-globinOS-thalassemia) Beta-globin(j3-thalassemia) Beta-globin(£-thalassemia) Beta-globinOS-thalassemia) Beta-globin03-thalassemia) Beta-globin(j3-thalassemia) Cholesteryl ester transfer protein Cystic fibrosis Cystic fibrosis Cystic fibrosis Cystic fibrosis Cystic fibrosis Cystic fibrosis Cystic fibrosis Cystic fibrosis Cystic fibrosis Dystrophin-DMD Dystrophin-DMD Dystrophin-DMD Erythropoietin receptor(EPOR) Factor VIIl(HaemA) Factor VHI(HeamA) Factor Vm(HaemA) Factor Vm(HaemA) Factor Vm(HaemA) Factor VrH(HaemA) Factor Vm(HacmA) Factor VHI(HaemA) Factor Vm(HaemA) Factor Vm(HaemA) Factor VIU(HaemA) Factor Vm(HaemA) Factor VHI(HaemA) Factor VTH(HaemA) Factor IX(HaemB) Factor IX(HaemB) Factor KO^aemB) Factor IX(HaemB) Factor EX(HaemB) Factor EX(HaemB) Factor IXOiaemB) Factor IX(HaemB) Factor IX(HaemB) Factor EX(HaemB) Factor IX(HaemB) Factor IX(HaemB) Factor IX(HaemB) Factor IX(HaemB) Nucleic Acids Research, 1994, Vol. 22, No. 8 1329 CCT(pro) TGG(trp) TGT(cys) AAT(asn) AAA(lys) AAG(lys) ATT(ile) AAGflys) GGC(gly) GGC(gly) CTT(leu) CAG(gln) GAC(asp) ACA(thr) CTT(leu) GAT(asp) GAA(gln) TTC(phe) AGC(ser) GAA(glu) AAC(asn) GAA(glu) TCA(ser) AGT(ser) GCC(ala) TGG(trp) CCA(pro) TTG(leu) GAT(asp) CTGfleu) GAC(asp) AGA(arg) CTCfleu) TGG(trp) CCC(pro) GGG(gly) ATC(ile) CAG(gln) CCA(pro) GTG(val) ATT(ile) CTTGeu) AAC(asn) CTTOeu) GTG(val) TTC(phe) CTGGeu) GTC(val) GGA(gly) ATG(met) AAAGys) AGT(ser) AAGGys) CTT(leu) CCC(pro) TTA(leu) CTTGeu) GCT(ala) GCT(ala) CCT(pro) GCC(ala) TTT(phe) TTT(phe) CCC(pro) TAT(tyr) CTG(leu) TGC(cys) CAC(his) GAG(glu) TTC(phe) GTG(val) ATC(ile) CCT(pro) CTGGeu) AAG(lys) TGG(trp) CAG(gln) GGA(gly) GAA(glu) TGG(trp) CGA(arg) CGA(arg) GAA(glu) TAT(tyr) TGG(trp) CAG(gln) TAC(tyr) CGA(arg) TGT(cys) CGA(arg) TCA(ser) GGA(gly) TTAGeu) TGG(trp) GAG(glu) TGG(trp) AAAGys) CGA(arg) TGG(trp) TGC(cys) GAA(glu) GAA(glu) GAA(glu) CGA(arg) CGA(arg) GAG(glu) TGG(trp) CGA(arg) CGA(arg) TGG(trp) TGG(tip) GAA(glu) TAC(tyr) CAG<gln) CGA(arg) CGA(arg) CGA(arg) CAG(gln) CGA(arg) TAC(tyr) CAG(gln) TGC(cys) TAC(tyr) CACKgln) TAT(tyr) TGG(trp) TGG<trp) TCA(ser) CGA(arg) TAT(try) TAC(tyr) CGA(arg) CGA(arg) CGA(arg) CAG(gln) AAGGys) TGC(cys) CGA(arg) CAG(gln) GAG<gIu) GGA(gly) CAAfeln) GAG(glu) GAA(glu) CGA(arg) GAA(glu) CGA(arg) GAG(glu) TGG(trp) CAG(gln) CAG(gln) GTA(val) GGC(gly) AAAGys) ATT(Ue) AAT(asn) ATT(Ue) TCA(tyr) GTA(val) GGA(gly) TAC(tyr) CTTGeu) GCC(ala) CTTGeu) TCT(ser) TGT(cys) GAT(asp) ACT(thr) GGT(gly) TGT(cys) ATT(ile) ACA(thr) CTT(val) GAT(asp) ACC(thr) AAGGys) AAC(asn) CTGGeu) GGGfely) GAC(asp) AGC(ser) ACC(thr) GGT(gly) GACHglu) CCT(pro) AAT(asn) AGG(arg) GTC(val) CCG(pro) ATC(Ue) GGA(gly) G AGT(ser) GAG(glu) CTTGeu) TGC(cys) CTCGeu) AAC(asn) GAT(asp) GAG(glu) AAGGys) GTG(val) GGC(gly) GAA(glu) AAT(asn) CCT(pro) GAG(glu) GTG(val) GTG(val) CAT(his) TCT(ser) CAA(gln) CAT(his) CCA(pro) CCG(pro) CGA(arg) CTGGeu) CTGGeu) GAG(glu) GAG(glu) GGA(gly) GTG(val) GTT(val) GTT(val) TCA(ser) TGA TAG TGA TAA TAG TGA TGA TAA TAA TGA TAG TAG TGA TGA TGA TGA TGA TGA TGA TAG TGA TAA TGA TGA TGA TAA TAA TAA TGA TGA TAG TGA TGA TGA TGA TAG TAA TAG TAG TGA TGA TGA TAG TGA TAG TAG TGA TAA TGA TAA TGA TAG TGA TGA TAA TAG TGA TGA TGA TAG TAG TGA TGA TAG TAG TGA TAA TAG TAA TGA TAA TGA TAG TGA TAG nt20562 nt2O363 nt30072 nt30090 nt3OO97 nt3O863 nt3O875 nt31OOl nt31O39 nt31051 nt31O91 nt31096 nt31118 nt31129 nt31133 nt3120O nt312O8 nt31257 nt31276 nt31283 M31342 nt31352 R185X nt5574 C720X E375X E357X E364X R359X R186X E279X W343X R137X R393X W26X W171X exon2 Y64X Q310X R897X R372X R988X Q672X R1000X Y167X Q12X C660X Y83X Q106X Y61X W382X W64X S447X nt2746 Y209X Y299X R426X R141X R109X Q192X K120X C135X R213X Q317X E221X Y226X Q136X E298X E286X R342X E198X R196X E224X W146X Q195X Factor IX(HaemB) Factor IX(HaemB) Factor IX(HaemB) Factor !X(HaemB) Factor EX(HaemB) Factor IX(HaemB) Factor IXG^aemB) Factor IX^aemB) Factor IXGlaemB) Factor IX(HaemB) Factor IX(HaemB) Factor IX(HaemB) Factor EX(HaeraB) Factor IX(HaemB) Factor IXG-IaemB) Factor IXG-IaemB) Factor IXG"IaemB) Factor IX(HaemB) Factor IX(HaemB) Factor IX(HaemB) Factor IX(HaemB) Factor IX(HaemB) Fanconi anemia-group C gene Fibrillin gene{Marfan syndrome) Fructose Intolerance-Aldolase B a-L-Fucosidase(fucosidosis) Fumarylacetoacetate hydrolase Fumarylacetoacetate hydrolase Glucocerebrosidase(Gaucber dis.) Glucokinase-NID diabetes Glucolcinase Glycoprotein lb alpha |3-hexosaminidase A-Tay Sachs /S-hexosaminidase A-Tay Sachs /3-hexosaminidase A-Tay Sachs typell 3/3 hydroxysteroid dehydrog. Hypothyroidism TSH B subunit gene IDUA G^urler syndrome) IDUA alpha-L-iduronidase Insulin receptorGeprechaunism) Insulin receptorGeprechaunism) Insulin receptor(diabetes) Insulin receptor Leprechaunism Insulin receptor LDL receptor(Hypercholesterolemia) LDL receptor LDL receptor(hypercholerterolemia) Lecithin cholesterol acyltransferase Lipoprotein lipase Lipoprotein lipase Lipoprotein lipase Lipoprotein lipase Lipoprotein lipase OCRL-1 oculocerebrorenal synd. Lowe Omithine aminotransferase Omithine aminotransferase Omithine aminotransferase Omithine transcarbamylase Omithine transcarbamylase p53 squamous cell carcinoma p53 Li Fraumeni syndrome p53 Hepatocellular carcinoma p53 Ovarian carcinoma, gastric tumour p53 Esophageal carcinoma p53 Osteocarcinoma p53 Ovarian carcinoma p53 Esophageal carcinoma p53 Hepatocellular carcinoma p53 Esophageal carcinoma p53 Breast cancer p53 Hepatocellular carcinoma p53 Fibrous histiocytoma p53 Ovarian carcinoma p53 Esophageal carcinoma p53 Esophageal carcinoma 1330 Nucleic Acids Research, 1994, Vol. 22, No. 8 Table 1. (continued). 5'codon Affected codon 3'codon Stop codon Site Gene or disease GAG(glu) TTC(phe) TTA(leu) TCA(ser) CATfliis) TAC(tyr) CAG(gln) CTC(leu) CTT(leu) GGC(gly) TAT(tyr) CGA(arg) TCA(ser) CGA(arg) GGA(gly) TGG(trp) TAC(tyr) CGA(arg) CGA(arg) TGCKtrp) TAT(tyr) TGG(trp) CGA(arg) GAG(glu) CAG(gln) TGG(trp) CAG(gln) CGA(arg) TCA(ser) CGA(arg) CGA(arg) TGG(trp) TGG(trp) CAG(gln) TAC(tyr) CAG(gln) CGA(arg) CGA(arg) CGA(arg) CGA(arg) TAT(ryr) CGA(arg) CGA(arg) CGA(arg) TTGOeu) GTC(val) GAG(glu) GAT(asp) TCC(ser) TTT(phe) TGC(cys) CCT(pro) GAT(asp) CAC(his) GAG(glu) GGA(gly) GGA(gly) GTC(val) CTG(lcu) CCT(pro) GAG(glu) GGA(gly) GTG(val) GTG(val) GCC(ala) ATG(met) GCA(ala) ATG(met) CGC(arg) TTC(phe) GTG(val) GAG(glu) AGG(arg) AAGflys) CTTfleu) CAG(gln) GCA(ala) GAA(glu) TAG TGA TGA TGA TGA TAG TAA TGA TGA TAG TAA TAG TGA TAG TAG TAG TAG TGA TGA TGA TGA TAG TGA TAG TAA TAG TGA TGA TGA TGA TAA TGA TGA TGA Y205X R261X S359X R111X Y272X W326X Y356X R243X R584X W198X Y145X W29X R732X E249X nt687 W406X Q318X R189X S223X R417X R52X W178X W71X Q119X nt970 Q149X R2535X R1659X ntlO84 p53 Ovarian carcinoma Phenylalanine hydroxylase(PKU) Phenylalanine hydroxylase(PKU) Phenylalanine hydroxylase(PKU) Phenylalanine hydroxylase(PKU) Phenylalanine hydroxylase(PKU) Phenylalanine hydroxylase(PKU) Phenylalanine hydroxylase PKU Platelet glycoproteinllb Porphobilinogen deaminase Pnon protein Protein C (PROC) Procollagen ll(COUAl) Rhodopsin SRY sex reversal Steroid 21 hydroxylase Steroid 21 hydroxylase Triosephosphate isomerase-anemia Tyrosine amino transferase Tyrosine amino transferase Tyrosine amino transferase Tyrosinase (oculocutaneous albinism) V2 receptor(X-linked NDI) V2-Vasopressin receptor(diabetes) Vitamin D receptor(rickets) Vitamin D receptor(rickets) Von Willebrand Factor Von WUlebrand typelll WT1-tumour suppressor-Wilms tumour WT1-tumour supressor Zn fingcr3 XP-A-Xeroderma pigmentosa XP-A-Xeroderma pigmentosa XP-A-Xeroderma pigmentosa XP-A-Xeroderma pigmentosa ?AC0 ACC(thr) GCT(ala) AAGflys) AAG(lys) TTC(phe) CTGfleu) CTC(leu) GGG<gly) ATC(ile) ATC(ile) GTC(val) ?AC0 CTGOeu) AAGflys) TGC(cys) GTC(val) CCC(pro) GAA(glu) CAG(gln) TCT(ser) GTC(val) CGG(arg) AAC(asn) Y116X R207X R228X R211X Entries are sorted alphabetically according to the gene which has been mutated or the common name of the resulting disease. Where the 3' and 5' context are not discernible from the paper describing the mutation they were determined from the published sequence or from the EMBL and Genbank databases held at Daresbury, UK. Where the site of the mutation is known, this is indicated as either the number of the codon preceded by the altered amino acid (in single letter code), and followed by X to indicate a terminator, or alternatively, as the nucleotide (nt) which has been mutated. This list can be supplied annotated with references, on request to RM. by electronic mail or on receipt of an IBM type disc. The list in Table 1 is not exhaustive. Others have independently published, and are constantly updating, a database of 880 single base pair substitutions which give rise to human genetic disease (14) A fraction of these will be mutations to stop codons. That database does not however include information on the full 5' and 3' codon contexts. suppression can vary by as much as an order of magnitude between the most efficient and the least efficient 3' contexts (Phillips-Jones, Hill, Atkinson and Martin: In Preparation). This pattern of context effects in human cells is quite different to that which operates in E.coli (6,8). There are also significant differences in the efficiency of suppressors for either UAG, UAA and UGA codons (9). The successful application of nonsense suppressor tRNAs as agents for human gene therapy, might therefore depend on both the proportions of UAG, UAA and UGA codons, and the spectrum of 3' codon contexts, amongst nonsense mutations that give rise to human genetic disease. Moreover, the likelihood that suppressor tRNAs would give rise to detrimental effects by reading through natural termination codons, will be determined by the differential distribution of nonsense codons favourable for suppression, between the population of nonsense mutations, and the population of natural stop codons. Given the number of nonsense mutations which have been described in human genes since the original proposal (1), we believe it is now possible to review the pattern of mutations giving rise to premature translation^ termination, with an eye to the potential use of nonsense suppressors as agents of somatic gene therapy. In this communication, we have surveyed the literature for reports of point mutations which lead to nonsense codons in human genes, and compared the distribution of the three termination signals and their 3' contexts, with that of natural stop codons. RESULTS The spectrum of mutations to nonsense codons in human genetic disease A total of 179 unique point mutations to nonsense codons were identified in human genes from a search of literature reports in a CD-ROM data base. Of these, 21 were either germ line or somatic cell mutations in the tumour suppressor genes p53 and APC. The mutational events we identified are listed in Table 1. The affected codon and the encoded amino acid are given for the site of the mutation, and it's 5' and 3' neighbours. Genes are sorted alphabetically according to the most commonly used name for either the gene product, or the genetic disease. This list can be supplied, annotated with references, on request to RM, by electronic mail or on receipt of an IBM type disc. Nucleic Acids Research, 1994, Vol. 22, No. 8 1331 Table 2. The distribution of point mutations amongst codons with the potential to mutate to UAG. UAA or UGA stop codons in human genetic disease. Stop Nucleotide Affected codon Number Base change C —T TAG 1st position AAG Lys CAGGln GAG Glu TCGSer TGGTrp TTGLeu TAC Tyr TAT Tyr 3 23 9 0 13 1 6 1 A:T-T:A C:G-T:A G:C-T:A * 1 A:T-T:A C:G-T:A G.C-T:A C:G-A:T T:A-A:T C:G-A:T T:A-A:T 2nd position 3rd position TAA 1st position 2nd position 3rd position TGA 1st position 2nd position 3rd position AAA Lys CAA Gin GAA Glu TCA Ser TTA Leu TAC Tyr TAT Tyr AGA Arg CGA Arg GGA Gly TCA Ser TTA Leu TGCCys TGGTrp TGT Cys G:C-A:T T:A-A:T C:G-G:C T:A-G:C 10 14 1 1 3 5 0 55 5 C:G-T:A G:C-T:A C:G-G:C T:A-G:C C:G-A:T G:C-A:T T:A-A:T 4 1 5 15 3 • • • * Entries in Table 1 were scored for the codon affected and the base change involved in mutation to the nonsense codon. Mutations arising from a C —T deamination are indicated by a *. Figure 1 illustrates the frequency of mutations to the three termination codons amongst the mutant alleles listed in Table 1: UAG (31 %), UAA (18%) and UGA (51 % ) . Figure 1 also shows the frequency of natural UAG, UAA and UGA codons used to terminate protein synthesis at the ends of human genes. In human cells, natural termination codon usage divides UAG (23%), UAA (30%) and UGA (47%) (10-12). Whilst UGA codons are the most frequent stop in both populations, the frequency of UAA terminators is greater for natural stops than amongst new mutations. The reverse is true for UAG. Overall, the two patterns are significantly different: (x2 = 12.1, P = 0.002). Table 2 shows the distribution amongst the possible base changes at 1st, 2nd or 3rd codon positions which lead to the creation of TAG, TAA and TGA mutations. TAG stops are derived largely from CAG (Gin) and TGG (Trp) codons, TAA mutations from CAA (Gin) and GAA (Glu), and TGA codons originate predominantly from mutations in CGA (Arg) and TGG (Trp). The C—T alteration far outweighs any other change which is seen. This is particularly so for mutations to TGA, for which the CGA (Arg) codon is especially susceptible. The reasons for this are thought to be well understood (13,14). C ~ T transition mutations are most likely caused by the spontaneous chemical deamination of cytosine to give uracil. This leads to a U:G mispair. U:G mispairs will become fixed as a C:G —T:A mutation, if DNA replication precedes the detection and removal of uracil by DNA uracil glycosylase. Where cytosine exists in mammalian genomes as 5-methyl cytosine, in the doublet CpG, cytosine deamination leads to a T:G mispair. The high rate of mutation at these sites suggests that the T:G mispair is less readily detected, or less faithfully repaired, than the U:G mispair. Conversely, methylation of cytosine at the 5 position, may elevate the rate of spontaneous deamination. [_ ^nonsense mutations ^natural stop codons 60 40 20 1 1I UAG UAA II UGA Figure 1. The frequency with which UAG, UAA and UGA termination codons occur as human disease causing mutations compared with the frequency of UAG, UAA and UGA as natural stop codons. The frequency of termination codons produced by nonsense mutation was taken from Table 1. The frequency of naturally occurring stop codons was taken from a sample of 1422 genes kindly supplied by Paul Sharp and Andrew Lloyd. The 3' codon context of mutations to nonsense codons in human genetic disease The distribution of 3' codon contexts amongst the 179 instances of nonsense mutations is shown in Figure 2. The 3' codon context found around natural termination codons is also displayed. The pattern of 3' contexts amongst mutations to UAG and UAA are not significantly different from the 3' bases flanking natural UAG 1332 Nucleic Acids Research, 1994, Vol. 22, No. 8 80 80 70 I 60 Y//\ I nonsense mutations 70 - natural stop codons 60 - 50 50 - 40 40 - 30 30 - 20 20 - 10 10 - 0 I I nonsense mutations notural stop codons 0 A C G U A C G U UAA 3'context UAG 3' context 80 I I nonsense mutations Y//\ natural stop codons 70 I 60 Y//X natural stop codons I nonsense mutations 50 40 30 20 10 0 A C G U UGA 3'context A C G U All stop codons 3' context Figure 2. The 3' context of human disease causing nonsense mutations compared to the 3' context of natural stop codons. The 3' context of disease causing nonsense mutations was taken from Table 1. The frequency of the 3' context of naturally occurring stop codons was calculated from a sample of 1422 genes kindly supplied by Paul Sharp and Andrew Lloyd. and UAA termination codons: (x2 = 7.2, P = 0.066, x2 = 0.072, P = 0.995 respectively). There is a significant difference however between new mutations to UGA and natural stops: (x2 = 8.1, P = 0.043). There is a lower frequency of A, and a higher representation of G 3' to natural UGA stop codons, than in new mutations to UGA. There is no difference in the pattern of 3' contexts between nonsense mutations and natural stops when UAG, UAA and UGA are combined: (x2 = 3.6, P = 0.303). DISCUSSION We present in this paper a survey of mutations to nonsense codons which give rise to human somatic cell and germ line diseases. As early as 1982, it was suggested that gene therapy of this class of disease loci might be attempted with human tRNA genes mutated to recognise stop codons (1). Readthrough at the nonsense mutation, by the suppressor, will restore a proportion of wild type gene function. Given the rapid progress being made in the identification of different nonsense mutations in human genes, and recent findings on the determination of suppressor efficiencies, it seems an appropriate moment to describe the patterns of mutation which occur and relate these to the possibility of suppressor tRNA gene therapy. In particular, experiments with reporter gene constructs have revealed differences in the effectiveness of suppressors according to which of the three codons UAG, UAA or UGA is to be read, and also the contexts in which these termination signals lie (6,7,9). This survey reveals that nonsense mutations occur in an approximate ratio of 1:2:3, for UAA, UAG and UGA respectively. Studies with human nonsense suppressors (9) suggest that suppressor efficiency varies UAG = UGA > UAA. The two most efficient suppressors can therefore recognise some 80% of nonsense mutations which lead to human genetic disease. When a suppressor tRNA reads a stop codon, the amino acid which is inserted is determined by the identity of the tRNA whose anticodon was mutated to match the termination triplet. At some sites, it might not matter which amino acid is inserted, so long as as translation is restored for the full length of the gene. At other sites, it might be important to restore authentic, wild type gene product. In this case the suppressor has to insert the amino acid corresponding to the codon in the unmutated gene. Our analysis reveals that C:G—T:A transitions predominate in the formation of stop codons. Trp, Gin and Glu codons are changed most frequently to UAG; Glu and Gin codons are changed most frequently to UAA; and overwhelmingly it is Arg and to a lesser extent Trp codons which give rise to UGA. To be widely applicable then, suppressor gene therapy would have to generate efficient suppressors from Trp, Gin, Glu and Arg tRNAs. Studies Nucleic Acids Research, 1994, Vol. 22, No. 8 1333 on the determination of tRNA 'identity elements', have shown that those bases in a tRNA molecule which are responsible for binding to the correct aminoacyl-tRNA synthetase enzyme, sometimes lie in the anticodon loop (15). Thus, when nonsense suppressors are created by mutagenesis of bases in this region, the tRNA may be charged with a different amino acid. Upon translation of a nonsense codon, this restores a normal length protein, but one which contains an amino acid substitution. For example, in E.coli UAG nonsense suppressors derived from tRNA11? are charged with Gin as well as Trp (16). Rapid advances are being made in this area. For bacterial tRNAs, it is now largely known for which tRNAs mutation to a nonsense suppressor gives rise to altered amino acid insertions (17). Interestingly, site directed mutagenesis can been used to control the extent of mischarging, and retain tRNA aminoacyl identity (18). It should not be long before similar information is available for human tRNAs. Research with bacterial tRNAs, has also established that the strongest nonsense suppressors are formed by altering the anticodon of tRNAs which normally read codons beginning with U (19). Whilst it is anticipated that similar rules will apply to human tRNAs, little work has been carried out on this aspect. Recent studies from this laboratory have established that the 3' codon context has a substantial effect on the efficiency of human UAG suppressor tRNAs in human cells (6,7). It seems likely that similar rules will apply to UGA codons (20). Our researches have shown that UAG codons flanked by 3' A are very inefficiently suppressed, whereas those followed by a 3' C or G are suppressed some five to ten fold more efficiently for a given concentration of tRNA. In prokaryote and lower eukaryote organisms it is believed that the choice between the three termination codons and their 3' codon contexts, is under translational selection pressure (11,12,21,22). In contrast, we and others, have argued that in mammalian cells, 3' termination codon contexts are shaped by mutation, and not by selection for optimum performance (23). This contention is reinforced by the present study. Mutations to nonsense codons in human disease loci are found in a similar range of 3' contexts to that observed for natural stop codons. Nonsense mutations in the human genome are fairly evenly divided between 3' contexts of A, C, G or U. In general, 3' G is most common and 3' U is least frequently observed. This distribution of bases matches very well the distribution observed 3' to natural stop codons (23). These patterns are largely determined by the local G+C content of the human genome, which is known to consist of substantial blocks or 'isochores' of sequences which differ widely in their richness for G+C (24,25). Given that the proportions of UAG, UAA and UGA are similar for new mutations and natural stop codons, the balance of probabilities is that termination codon choice, is not subject to translational selection in human cells either. The findings of this study have important implications for assessing the likelihood that suppressor tRNAs will be detrimental to the physiology of the cell, if they cause readthrough at a significant number of natural termination codons. C-terminal extended species may be degraded prematurely, they may have reduced enzyme activities, or they could display codominant, negative properties in their interaction with other proteins. Even short C-terminal extensions can have serious consequences for some polypeptides. For example, mutations which eliminate the natural stop codon of the a-globin gene give rise to a C-terminal extension of 31 amino acids. This causes a severe, dominant form of thalassemia (4). Of course, in the case of gene therapy by a suppressor tRNA, the level of the tRNA could be adjusted so that readthrough by at a natural stop codon may be as little as 5 —10%, if this concentration of suppressor proved sufficient to reverse the mutant phenotype. Readthrough of this intensity at natural termination codons, may not present so drastic an outcome, in the presence of 90-95% of correctly terminated polypeptide chains. This review of nonsense mutations and natural stop codons, suggests that both populations are similar in their proportions of UAG, UAA and UGA, and in the distributions of their 3' contexts. Where differences exist, these are in favour of suppression therapy. UAG and UGA mutations account for 82% of human mutations to stop, whereas UAG and UGA comprise only 70% of natural termination codons. Contrary to some earlier suggestions (26), natural stop codons in human cells do not seem to be protected in any special way from translational readthrough by their immediate 3' contexts. Studies have shown that there is no significant evidence to support the widespread belief that multiple stop codons are employed by cells to provide a fail-safe mechanism for terminating protein synthesis (22,27). There are indications from E.coli though, that the nature of the C-terminal amino acids within the nascent polypeptide, can influence the efficiency of translational termination (28,29). Moreover, surveys of bacterial gene sequences have suggested preferences for certain amino acids at the C-terminus, which could reflect on the efficiency of stop decoding (11,30). If C-terminal amino acids are selected to improve the efficiency of translational termination in human cells, this could increase the specificity of nonsense suppressors for stop mutations over natural termination codons. However, this appears unlikely in the light of the studies which show that the counterparts to bacterial preferences in mRNA sequences relating to codon usage and 3' codon context effects, are missing in human cells (23,31). ACKNOWLEDGEMENTS JA is the recipient of an MRC postgraduate studentship. RM is supported by a Royal Society University Research Fellowship. The Krebs Institute is a SERC centre for molecular recognition. This work benefited from the use of the SEQUENET facility. REFERENCES 1. Temple, G.F., Dozy, A.M., Roy, K.L. and Kan, Y.W. (1982) Nature, 296, 537-540. 2. Geiduschek, E.P. and Tocchini-Valentini, G.P. (1988) Ann. Rev. Biochem., 57, 873-914. 3. Capone, J.P. (1988) DNA, 7, 459-468. 4. Cooper, D.N. (1993) Ann. Med., 25, 11-17. 5. Diaz, H.C., Valle, D., Francomano, C.A., Kendzior, R.J.Jr., Pyeritz, R.E. and Cutting, G.R. (1993) Science, 259, 680-683. 6. Phillips-Jones, M.K., Watson, F.J. and Martin, R. (1993) J. Mol. Biol. 233, 1-6. 7. Martin, R., Phillips-Jones, M.K., Watson, F.J. and Hill, L.S.J. (1993) Biochem. Soc. Trans., 21, 843-851. 8. Miller, J.H. and Albertini, A.M. (1983) J. Mol. Biol., 164, 5 9 - 7 1 . 9. Capone, J.P., Sedivy, J.M., Sharp, P.A. and RajBhandary, U.L. (1986) Mol. Cell. Biol., 6, 3059-3067. 10. Brown, C M . , Dalphin, M.E., Stockwell, P.A. and Tate, W.P. (1993) Nucleic Acids Res., 21, 3119-3123. 11. Brown, C M . , Stockwell, P.A., Trotman, C.N. and Tate, W.P. (1990) Nucleic Acids Res., 18, 6339-6345. 12. Cavener, D.R. and Ray, S.C. (1991) Nucleic Acids Res., 19,3185-3192. 13. Youssoufian, H., Kazazian, H.H.Jr., Phillips, D.G., Aronis, S., Tsiftis, G., Brown, V.A. and Antonarakis, S.E. (1986) Nature, 324, 380-382. 1334 Nucleic Acids Research, 1994, Vol. 22, No. 8 14. Cooper, D.N. and Krawczak. M. (1993) Human Gene Mutation. Bios Scientific Publishers. Oxford. 15. Pallanck, L. and Schulman, L.H. tRNA discrimination in aminoacylation. In: Transfer RNA in Protein Synthesis, edited by Hatfield, D.L., Lee, B.Y. and Pirtle, R.M. CRC Press, 1992, p. 279-318. 16. Raftery, L.A., Egan. B.J., Cline, S.W. and Yarus, M. (1984) J. Bacteriol., 158, 849-859. 17. Kleina, L.G., Masson, J.M., Normanly, J., Abelson, J. and Miller, J.H. (1990) J. Mol. Biol., 213, 705-717. 18. Normanly, J., Kleina, L.G.. Masson. J.M., Abelson, J. and Miller, J.H. (1990) J. Mol. Biol., 213, 719-726. 19. Yarus, M. (1982) Science, 218. 646-652. 20. Li, G. and Rice, C M . (1993) J. Virol., 67, 5062-5067. 21. Sharp, P.M. and Bulmer, M. (1988) Gene, 63. 141-145. 22. Brown, C M . , Stockwell, P.A., Trotman, C.N.A. and Tate, W.P. (1990) Nucleic Acids Res., 18, 2079-2086. 23. Martin, R (1994) Nucleic Acids Res., 21, 15-19. 24. Sharp, P.M., Burgess, C.J., Lloyd. A T . and Mitchell, K.J. Selective use of termination codons and variations in codon choice. In: Transfer RNA in Protein Synthesis, edited by Hatfield, D.L., Lee, B.Y. and Pirtle, R.M. Boca Raton: CRC Press. 1992, p. 397-425. 25. Bemardi, G. (1993) Mol. Biol. Evol., 10, 186-204. 26. Bienz, M., Kubli, E., Kohli, J., deHenau, S., Huez. C Marbaix, G and Grosjean, H. (1981) Nucleic Acids Res., 9, 3835-3850. 27. Kohli, J. and Grosjean, H. (1981) Mol. Gen. Genet.. 182, 430-439. 28. Mottagui-Tabar, S., Bj6msson, A. and Isaksson, L.A. (1994) EMBO J.. 13, 249-257. 29. Arkov, A.L., Korolev, S.V. and Kisselev, L.L. (1993) Nucleic Acids Res., 21, 2891-2897. 30. Gutman, G.A. and Hatfield, G.W. (1989) Proc. Natl. Acad. Sci. USA, 86. 3699-3703. 31. Eyre-Walker, A.C. (1991) J. Mol. Evol.. 33, 442-449.