* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Protein sequence comparisons show that the
Restriction enzyme wikipedia , lookup
Expression vector wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Biochemistry wikipedia , lookup
Genomic library wikipedia , lookup
Gene desert wikipedia , lookup
Community fingerprinting wikipedia , lookup
Transposable element wikipedia , lookup
Molecular ecology wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biosynthesis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Multilocus sequence typing wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Genetic code wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Homology modeling wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Nucleic Acids Research, Vol. 18, No. 14 4105 © 7990 Oxford University Press Protein sequence comparisons show that the 'pseudoproteases' encoded by poxviruses and certain retroviruses belong to the deoxyuridine triphosphatase family Duncan J.McGeoch MRC Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK Received May 7, 1990; Accepted June 12, 1990 ABSTRACT Amino acid sequence comparisons show extensive similarities among the deoxyuridine triphosphatases (dUTPases) of Escherichia coli and of herpesviruses, and the 'protease-like' or 'pseudoprotease' sequences encoded by certain retroviruses in the oncovirus and lentivirus families and by poxviruses. These relationships suggest strongly that the 'pseudoproteases' actually are dUTPases, and have not arisen by duplication of an oncovirus protease gene as had been suggested. The herpesvirus dUTPase sequences differ from the others in that they are longer (about 370 residues, against around 140) and one conserved element ('Motif 3') is displaced relative to its position in the other sequences; a model involving internal duplication of the herpesvirus gene can account effectively for these observations. Sequences closely similar to Motif 3 are also found in phosphofructokinases, where they form part of the active site and fructose phosphate binding structure; thus these sequences may represent a class of structural element generally involved in phosphate transfer to and from glycosides. INTRODUCTION During a comparative analysis of amino acid sequences encoded by retroviruses McClure et al. (1) noticed a class of related sequences of around 140 residues which are specified by some viruses in the oncovirus and lentivirus groups, but not by all retroviruses. In the oncoviruses the novel coding sequence is part of the gag gene, adjacent to protease coding sequences, whereas in the lentiviruses it is located at a distal position within the pol gene. The function of this polypeptide was unknown. However, on the basis of a low level similarity with the retroviral proteases, it was proposed that the unknown gene had evolved by duplication of an oncovirus protease gene and subsequent divergence. The polypeptides were then termed 'protease-like' domains (1) and later 'pseudoproteases' (2); the latter term is used in this paper, as a convenient label only. A model was proposed by which the pseudoprotease coding sequence could have been transferred from the oncovirus lineage to the lentivirus lineage (1). Subsequently, clearly related genes were discovered in two poxviruses, namely vaccinia virus and orf virus (3,2). The poxvirus genes each consist of an independent open reading frame with appropriate transcriptional control signals, and the vaccinia virus gene was shown to be transcribed early in infection (3). I have now found that the amino acid sequences of pseudoproteases are characteristically similar to those of deoxyuridine triphosphatase (dUTPase) enzymes encoded by herpesviruses and also by Escherichia coli; this discovery was made as part of a programme pursuing herpesvirus gene functions and evolutionary relationships. In this paper I describe the sequence relationships among pseudoproteases and dUTPases, and outline some implications of these findings. METHODS Amino acid sequence data were examined using the GCG program set (16) running under VAX/VMS. Several other sequence comparison programs were also used, including those of Pearson & Lipman (17), Gribskov et al. (18) and Argos (19). Database searches used Swissprot release 13. RESULTS Amino acid sequence comparisons with pseudoproteases and dUTPases Amino acid sequences inferred from the gene sequences are known for three herpesviral dUTPases, from herpes simplex virus type 1 (HSV-1; ref. 4) varicella-zoster virus (VZV; ref. 5) and Epstein-Barr virus (EBV; ref. 6). The herpesviral dUTPase genes are HSV-1 UL50, VZV gene 8 and EBV BLLF3 (residues 88474 to 87641 in the DNA sequence: originally named BLLF2; ref. 6). The functions of the VZV and EBV proteins were assumed from comparison with the HSV-1 sequence whose function had been established by biochemical and genetic analyses (7,8). The EBV sequence exhibits a large internal deletion relative to the other two and is also divergent from the other sequences. These aspects lessen its usefulness for sequence comparisons, and it is dealt with only at a later point in this paper. The sequences of HSV-1, VZV and EBV dUTPases contain 371, 396 and 278 amino acids respectively. The only other dUTPase sequence 4106 Nucleic Acids Research, Vol. 18, No. 14 SRVl MMTV . .SLWGGQLCSSQQKQPISKLTRATPGSAGLDLSST.SHTVLTPEMGPQALSTGIYGPLPPNTFG. . L I L G R S S I T I K . GLQVYP . .GVIDNDYTGEIK . .GVKGSGLNPEAPFFPIHDLPRGTPGSAGLDLSSQ.KDLILSLEDGVSLVPTLVKGTLPEGTTG. . L I I G R S S N Y K K . G L E V L P . .GVIDSDFQGEIK Visna EIAV Orf Vaccinia SEIFLAKEGRGILQKRAEDAGYDLIC. EEIMLAYQGTQIKEKRDEDAGFDLCV. MEFCHTETLQWRLSQNATIPARGSPGAAGLDLCS. MNINSPVRFVKETNRAKSPTRQSPYAAGYDLYS. HSV1 VZV . .ELTPVQTEHGDGVREAIAFLPKREEDAGFDIWR.RPVTVPANG.TTWQPSLRMLHADAGPAACYVLGRSSLNAR.GLLWP. . TRWLPGHVCAF . . .HRDSAEYHIDVPLTYKHIINPKRQEDAGYDICVP. YNLYLKRNEFIKIVLPIIRDWDLQHPSINAYIFGRSSKSRS.GIIVCP. . TAWPAGEHCKF . E. coli . PQEISIPAGQVKRIAIDLKINLKKDQWA. . PYDIMIPVSDTKIIPTDVKIQVPPNSFG. .AYDCVIPSHCSRWFTDLLIKPPSGCYG. .AYDYTIFPGERQLIKTDISMSMPKGCYG. .MIGTKSSFANK.GVFVQG. .WVTGKSSMAKQ . GLLING. .RIAPRSG.AVKHFIDVGA. .RIAPRSGLSLK.GIDIGG. .GIIDSGYQGTIQ .GIIDEGYTGEIQ .GVIDEDYRGNVG .GVIDEDYRGNIG MKKIDVKILDPRVGKEFPLPTYATSGSAGLDLRACLNDAVELAPGDTTLVPTGLAIHIADPSLAA.MMLPRSGLGHKHGIVLGNLVGLIDSDYQGQLM Onco Con L e n t i Con Pox Con 4 / 6 Con 5 / 6 Con 6 / 6 Con M o t i f 1 ===== Motif 2 ==== G—L PI — L-R-TPGSAGLDLSS L—E-G T G-LP—T-G—LI-GRSS EI-LA—G— I—KR-EDAG-DL P— I-IP K-I—D-KI KSS-A V A—P-R-SP-AAG-DL-S—AYD--I TD P-GCYG—RIAPRSG R—P— AG-DL-S IP TD P G 1—RSS AG-DL T P G 1 S AG-DL S H e r p e s Con PKR-EDAG-DI-V N V-P--R Y—GRSS M o t i f 3 ========= K-GL-V-P—GVID-D—GEIK G G—GIID-GY-G-IQ K--ID-G GVIDEDYRGN-G K-G--V GVID-DY-G-IK-G G-ID—Y-G-IG-ID G G--V-P--T-W—G--C-F- SRV1 MMTV IMAKAVNN. IVTVPQGNRIAQLILL VMVKAAKN.AVIIHKGERIAQLLLL P L I . . . .ETDNKVQQPYRGQGSFGS . SDIYW. . PYL KLPNPVIKEERGSEGFGSPSHVHW. . Visna EIAV Orf Vaccinia WIYNSNNKEWIPQGRKFAQLILM VICTNIGKSNIKLIEGQKFAQLIIL WLFNFGNSDFEVKKGDRIAQLICE VILINNGKCTFNVNTGDRIAQLIYQ PLIHEELEPWGETRKTERGEQGFGS . TGMYW. . QHHSNSRQPWDENKISQRGDKGFGS . TGVFW. . RISCPAVQEVNCLDNTDRGDSGFGS . TGSGA. . RIYYPELEEVQSLDSTNRGDQGFGS.TGLR. . . HSV1 VZV . WYNLTGVPVTLEAGAKVAQLLVAGADALPWIPPDNFHGTKALRNYPRGVPDSTAEPRNPPLLVFTNEFDAEAPPSERGTGGFGS . TGI .YVYNLTGDDIRIKTGDRLAQVLLIDHNTQIHL.KHNVLSNIAFPYAIRGKCGIPG VQWYFTKTLDLIATPSERGTRGFGS . TDKET . . E. c o l i ISVWNRGQDSFTIQPGERIAQMIFV Onco Con Lenti Con Pox Con 4/6 Con 5/6 Con 6/6 Con Motif 4 ======= -M-KA—N—V G-RIAQL-LL V N G-KFAQLI V-L-N-G F-V—GDRIAQLI V N—N G-RIAQLI V G AQLI G AQL Herpes Con—VYNLTG G AQ-L PWQAEFNLVEDFDATDRGEGGFGH. SGRQ P RI—P N A RG FT Motif 5 ======== N-V RG FGS-S W PW-E RG—GFGS-TG—W EV—LD-T-RGD-GFGS-TG RG—GFGS-TG—W RG—GFGS RG FGS D—A-PSERGT-GFGS-T Figure 1. Alignments of amino acid sequences of pseudoproteases and dUTPases. The sequences shown for SRVl, MMTV, visna virus and EIAV, are for the pseudoprotease domain as defined by McClure et al. (1). See refs 1 and 2 for original retrovirus sequence references. The orf virus, vaccinia virus and E. coli sequences are shown starting with their translational initiators. The HSV-1 sequence is for residues 193 to the C-terminus at 371. The VZV sequence is for residues 212—385. Internal padding characters are indicated by dots. Leading and trailing dots indicate that the protein sequence extends further than shown. The locations of five conserved motifs in the retrovirus plus poxvirus sequences are indicated by double bars, and corresponding regions in the E. coli sequence and the herpesvirus consensus are marked by single over-lines. known is for the enzyme of £. coli; this contains 151 amino acids (9). I found that certain amino acid motifs conserved among pseudoproteases are present in herpesvirus dUTPases, mostly in the C-terminal halves of their sequences. Subsequently, I realized that the E. coli sequence is also similar. These relationships are illustrated by the sequence alignments shown in Fig. 1. This contains sequences from two oncoviruses [simian retrovirus 1 (SRVl) and mouse mammary tumour virus (MMTV)], two lentiviruses [visna virus and equine infectious anaemia virus (EIAV)], two poxviruses (orf virus and vaccinia virus), two herpesviruses (HSV-1 and VZV), and from E. coli. Four other retrovirus sequences are known which display pseudoprotease domains (1,2): the selection used here was chosen to give a manageable amount of data while displaying a degree of divergence suitable to highlight conserved sequence elements. Fig. 1 was constructed by first making many pairwise alignments of sequences using the Bestfit program. The overall alignment shown was then produced 'by hand' using the pairwise alignments as guides. Introduction of gaps in sequences was kept to a minimum, and wherever possible gaps are presented in register across the set of sequences. The variability in the sequences is such that this strategy loses some optimality in individual pairwise comparisons. However, I consider that it gives a valid overall view, and the result is certainly adequate to illustrate the major sequence similarities of the set. The gaps introduced are all small, with one exception: near the end of the aligned set a gap equivalent to 36 residues was introduced into all the sequences except those of the two herpesviruses. This is justified by the occurrence in flanking positions of compelling sequence similarities. In addition, this region varies in length between HSV-1 and VZV, and also to some extent in other cases, which suggests that it represents a structural feature not subject to stringent restrictions on chain length. In order to draw out the conserved aspects of the sequences and at the same time give information on their degree of conservation, a number of consensus sequences are presented. These include separate consensi for the oncoviruses, the lentiviruses and the poxviruses, and consensus sequences representing different degrees of conservation among all six of these sequences. There are five major local conserved regions in these six pseudoprotease sequences, and these are labelled in Nucleic Acids Research, Vol. 18, No. 14 4107 Table 1. Comparisons of conserved positions between pseudoproteinases and dUTPases Onco Lenti Pox Herpes E. coli Onco Lenti Pox Herpes E. coli _ 36.5 39.5 28.5 39.0 36.5 38.5 34.0 32.0 39.5 38.5 25.75 40.0 28.5 34.0 25.75 26.0 39.0 32.0 40.0 26.0 - Mean 35.87 35.25 35.94 28.56 34.25 For each pair of sequences in Fig. 1, the positions at which identical residues occurred were summed, omitting positions at which any padding character was added. This gave a total of 128 positions considered. Scores averaged for related viruses were then computed, and are presented. Fig. 1 as Motifs 1—5. Motif 1 is a region which McClure et al. (1) considered homologous to the aspartate protease catalytic site sequence Asp-Thr-Gly or Asp-Ser-Gly. Motif 2 is a poorly conserved element, which gains in visibility when comparisons are made with the E. coli and herpesviral dUTPase sequences (see below). When the pseudoprotease sequences are compared with E. coli dUTPase, it can be seen that all of the motifs are present in the E. coli sequence. E. coli Motif 1 is most similar to the oncovirus consensus, while E. coli Motif 2 is identical to the poxvirus version. Outside the motif regions the similarities between E. coli dUTPase and the pseudoprotease sequences are less pronounced, but there are many local identities with one or more of the other sequences and alignments of similar amino acid types. In addition, the overall length of E. coli dUTPase is closely similar to those of the pseudoproteases. The two poxvirus genes each have their own translational initiation and termination sites, which the E. coli positions match quite closely. On comparing the HSV-1 and VZV sequences with all the others, convincing counterparts are seen for Motifs 1, 2, 4 and 5. The herpes virus Motif 1 is particularly close to the lenti virus version, and the herpesvirus Motif 2 is identical to the oncovirus version. These motif regions represent the majority of the sequences most conserved between HSV-1 and VZV, and outside them the herpesvirus sequences show much lower similarity to the non-herpesvirus sequences. However, in the region corresponding to Motif 3 the herpesviral sequences are not similar to the others. Thus, while the C-terminal regions of the HSV-1 and VZV dUTPases are convincingly related overall to the whole pseudoprotease domain and to E. coli dUTPase, they lack one major conserved element present in all of the non-herpesvirus sequences. Relationships between the sequences in Fig. 1 were also evaluated by computing for each aligned pair the number of identical residues seen at corresponding positions, excluding all positions at which a padding character had been inserted in any sequence. The four most similar pairs were: SRV1 and MMTV (score of 65 out of 128); visna virus and EIAV (score of 56); vaccinia and orf viruses (score of 73); and HSV-1 and VZV (score of 54). All these pairs are of known related viruses. To examine other relationships, data for each of these pairs were averaged, as shown in Table 1. For each of the five groups so defined (oncovirus, lentivirus, poxvirus, herpesvirus and E. coli), means were also calculated for comparisons with all of the other groups. These data indicate that the oncovirus, lentivirus, poxvirus and E. coli sequences are approximately equally related to each other, while the herpesvirus sequences are rather distinct from the others. The lower scores for herpesviruses can mostly be accounted for by their lack of Motif 3. Similar conclusions on relatedness were taken from exercises in constructing similarity trees (not shown). This set of sequence comparisons was completed by an unexpected finding: the absent herpesvirus Motif 3 is present in the N-terminal halves of the herpesvirus dUTPase sequences. The N-terminal portions of the HSV-1, VZV and EBV dUTPases, of some 200 residues, show little overall sequence similarity to each other, with only one convincingly conserved region. An alignment of this region and its surroundings is presented in Fig. 2. As shown in the figure, this conserved region is very closely similar to Motif 3 of the retroviral, poxviral and E. coli sequences, in both the invariant residues and the types of amino acids present at positions which are not completely conserved. The similarities of the pseudoproteases with E. coli dUTPase in the first instance, and secondarily with the herpesviral dUTPases, are compelling. They result in the clear conclusion that the dUTPase and pseudoprotease genes are evolutionarily related, and hence in the proposal that the pseudoproteases may well be dUTPases. This raises questions concerning the structures of the herpesvirus dUTPases and the functionality of the conserved motifs, which are pursued in the following sections. More general issues arising are dealt with in the Discussion. A model for the structure of herpesviral dUTPase Fig. 3A summarizes the arrangements of all the major similar motifs found. To understand the relationship of the herpesviral enzymes to the other sequences it is necessary to account for the observed difference in ordering of conserved elements of polypeptide sequence. In Fig. 3B I present a model which does this with economy. First, suppose that the active form of the E. coli type of dUTPase is a dimeric molecule, and that the active site (or some other essential functional structure) is composed of sequences contributed by both subunits, including residues of the conserved motifs; there are thus two active sites per dimer. In the particular version shown in Fig. 3B, each active site contains Motif 3 from one subunit, and Motifs 1,2,4 and 5 from the other. Next, suppose that the herpesvirus dUTPase represents the product of an intragenic duplication, so that the active enzyme molecule is a monomeric polypeptide chain containing the equivalent of both chains in the E. coli dimer. During evolution one of the active sites is then lost, leaving one active site per large monomer: this loss is equivalent to mutational destruction of Motifs 1, 2, 4 and 5 in the N-terminal half of the chain, and of Motif 3 in the C-terminal half; this is the situation observed. This model was inspired by the example of the genuine 4108 Nucleic Acids Research, Vol. 18, No. 14 HSV1 vzv EBV 80 106 59 HAPALASPGHHVIL . GLIDSGYRGTVMAVWAPKR. TRE KDTALADEDNFFVANGVIDAGYRGVISALLYYRPGVT.V MLWGSTSRPVTSHV.GIIDPGYTGELRLILQNQRRYNST Herpes Con 2/3 Herpes Con 3/3 E. coli Onco Con Lenti Con Pox Con 4/6 Con 6/6 Con ALAS 71 G-ID-GYRG G-ID-GY-G A-L R-T— SGLGHKHGIVLGNLVGLIDSDYQGQLMISVWNRGQDSFT SS K-GL-V-P—GVID-D—GEIK-M-KA—N—VSS-A G G—GIID-GY-G-IQV N SG K—ID-G GVIDEDYRGN-GV-L-N-G FSS K-G--V GVID-DY-G-I-V N---N S G_ID G Motif 3 Figure 2. Location of Motif 3 in the N-terminal region of herpesvirus dUTPases. Sequences extracted from the N-terminal regions of the herpesviral dUTPases are shown aligned, with starting residue numbers indicated, around the counterpart of the Motif 3 of Fig. 1. The Motif 3 sequences and their surroundings for E. coli dUTPase and pseudoprotease consensus sequences (from Fig. 1) are presented for comparison. N|1 2 3 4 retroviral proteases and other aspartyl proteases: the retroviral enzymes are active as dimers, whereas some of their homologues from other sources are double-length monomers whose genes have been internally duplicated (10,11). I consider that there is a lack of significant evidence for the common evolutionary origin of proteases and pseudoproteases as proposed by McClure et al. (1) (see Discussion), so I regard the aspartyl protease structures as providing a valuable paradigm but not direct evidence in support of the dUTPase model. Some indirect evidence is available in support of the model. It is known that E. coli dUTPase is actually a tetramer, which is consistent with the model (12). Caradonna and Adamkiewicz (13) showed that the HSV-1 enzyme is monomeric, and in the same paper reported that dUTPase from HeLa cells is a dimer, with the monomer having an estimated Mr of 22,500 (the HeLa protein's sequence is not known). Direct evidence would require the demonstration that sequence 5 |C T]c B Figure 3. Arrangement of motifs and model for dUTPase quaternary structure. A. The linear arrangement of motifs in the E. coli and pseudoprotease sequences is indicated on the left, and the herpesvirus arrangement on the right. B. The left cartoon presents a model for the E. coli type of dUTPase. The active enzyme is shown as a dimer with two active sites, each composed of Motifs 1, 2, 4 and 5 from one monomer, and Motif 4 from the other. The right cartoon represents a herpesvirus dUTPase monomer, with folding corresponding to the E. coli dimer; the N-terminal region is shaded. Motif 1 ===== AG-DL AG_DL 5/9 Con 7/9 Con SMRV HERV V-T P Motif 2 ==== G I-GRSS Motif 3 ========= K-G—V G-ID-DY-G-IG G _ I D G ___ RS PPANPCPPSNQPRRYVTDLWRATAGSAGLDLCTT.TDTILTTQNSPLTLPVGIYGPLPPQTFG. . LILAEPALPSK.GIQVXP . .GILDNDFEGEIH TPTVPSVSGNKPVTTIQQLSLTTSGSAAVDLCTI .QAVSLLPGEPPQKIPTGVYGPLPEGTVG. . LILGRSlLNLK.GVQIHT. . SWDSDYKGEIQ RNMGTNFRKAIKRKRFPRNLRNGLACRSD.FLLMPQMNV. .QPVPVHSPGPLPPATIG. . LILGRGSLTLQ.GLIIYP . .GTVDPYHKEEIQ. . .AFRYATPQMEEDKGPINHPQYPGDVGLDVSLP.KDLALFPHQT.VSVTLTVPPPSIPHHRP. . TIFGRSGLAMQ. GILVKP . .CRWRRGGVDVS . Motif 4 ======= M o t i f 5 ======== 5/9 Con 7/9 Con V—N SMRV IILSTTKD. .VTIPKGTRLAQIVIL. PLQ. . . . QINSNFHKPYRGASAPGS . SDVYW. . HERV LVISSSIP. -WSASPRDRIAQLLLL pyi IAPH VLCSSPRG.VFSIKQGDRIAQLVL PPS...LGDGETYTLQKRAMGSSGSDSAYL. . . N G-RIAQLI p G—AQL RG-GFGS-TG RG—GFGS KGGNSEIKRIGGLVSTDP . TGKAA. . . LTNFSDQ. TVFLNKYRRFCQLVYLHKHHLTSFYSPHSDAGVLGPRSLFRWASCTFEE . . . VPSLAMGDSGLSEALEGRQGRGFGS . SGQ Figure 4. Arrangement of motifs in variant pseudoproteases and EBV dUTPase. Consensus sequences derived from all the sequences in Fig. 1 including HSV-1 and VZV are shown aligned with sequences from squirrel monkey retrovirus (SMRV), human endogenous retrovirus (HERV), intracisternal A particle of hamsters (IAPH), and with residues 108-278 of EBV dUTPase. See refs 1 and 2 for retrovirus sequence references. Sequences corresponding to the consensus motifs are overlined. Nucleic Acids Research, Vol. 18, No. 14 4109 or structural similarities exist between the N-terminal and Cterminal halves of the herpesvirus dUTPases. I have not been able to detect any convincing overall sequence similarity. This is not surprising, however, when it is considered that since the HSV-1 and VZV lineages diverged their dUTPase genes have mutated to the point that in the present day amino acid sequences of the N-terminal halves little more than Motif 3 is conserved. I pursued this examination further by comparing in the various sequences the surroundings of Motif 3, in terms of hydrophobicities, predicted probabilities of surface occurrence and predicted secondary structures. General similarities can be discerned between the herpesvirus sequences and the others, extending at least to 30 or 40 residues on each side of the motif; however, I do not consider that such observations provide critical evidence (data not shown). A possible sugar phosphate binding element As was noted above, four retroviral pseudoprotease sequences, and also the EBV dUTPase sequence, were not included in Fig. 1. One of those omitted was for Mason-Pfizer monkey virus, which is almost identical to the SRV1 sequence. The other three retroviral sequences and the EBV sequence are aligned in Fig. 4 with overall consensus sequences derived from Fig. 1. It can be seen that in each case certain of the previously conserved motifs are significantly altered, although all four sequences are nonetheless clearly related to the sequences listed in Fig. 1. Thus, it is to be assumed that the pseudoprotease and dUTPase sequences known at present do not delimit possible variability in Motifs 1 to 5 in this polypeptide family. Extensive searches were made in the Swissprot library (release 13) for protein sequences and for motifs within sequences which might be related to the dUTPase/pseudoprotease family of sequences. These used as probes both complete sequences and individual motif sequences. No proteins emerged as convincing additional members of the family. Searches with individual motifs (and variants of motifs) did not yield anything of visible interest for Motifs 1, 2, 4 and 5. However, an intriguing correlation was found for Motif 3: many of the sequences most similar to this are in library entries for enzymes involved in phosphate transfer to and from glycosides-a category which also includes dUTPase. The most compelling example was for five prokaryotic and eukaryotic phosphofructokinases, as shown in Fig. 5. Crystallographic structures have been determined for the Bacillus stearothermophilus and E. coli (isozyme 1) phosphofructokinases (14,15). In both these cases the analogue of Motif 3 forms a loop on the protein surface and comprises part of the active site. The aspartate residue equivalent to position 4 in the motif (i.e. the only completely invariant residue in the motif; see Fig. 5) is involved in hydrogen bonding to fructose ring hydroxyl groups, and the aspartate equivalent to position 6 is involved in hydrogen bonding water molecules associated with a phosphate-bound Mg 2+ ion. I suggest that Motif 3 may represent a class of functionally related structures commonly employed in glycoside binding and phosphate transfer. In the case of HSV-1 dUTPase, it is known that the Motif 3 locality is functionally important, since its disruption by a small in-frame insertion gives an enzymatically inactive protein (ref. 7; V. G. Preston, personal communication). DISCUSSION The primary finding of this study is that the 'pseudoprotease' sequences of retroviruses and poxviruses show extensive dUTPases HSV1 VZV EBV E. coli 13 5 7 9 GLIDSGYRG GVIDAGYRG GIIDPGYTG GLIDSDYQG Pseudoproteases SRV1 MMTV Visna EIAV Pox (2) SMRV HERV IAPH GVIDNDYTG GVIDSDFQG GIIDSGYQG GIIDEGYTG GVIDEDYRG GILDNDFEG SWDSDYKG GIVDPYHKE Phosphofructokinases E. coli B. stear. Mammals (3) Consensus GTIDNDIKG GTIDNDIPG GSIDNDFCG G-IDND—G 13 5 7 9 Figure 5. Comparison of Motif 3 with active site sequences of phosphofructokinases. Motif 3 sequences from Figs 1, 2 and 4 are aligned with sequences from five phosphofructokinases (extracted from Swissprot release 13). similarity to sequences of known dUTPases. This has implications at four levels. Firstly, it demonstrates that the pseudoprotease and dUTPase genes have a common origin; ideas on the evolution of pseudoprotease coding sequences must take account of this. Secondly, it suggests strongly that the pseudoprotease polypeptides are actually dUTPases; this prediction is open to experimental analysis. (A more circumspect prediction would be that the pseudoproteases are either dUTPases or have a related function such as some other phosphotransferase activity; given that no examples of the latter possibility have emerged from database searches, it seems rather unlikely.) Thirdly, the idea that poxviruses and some retroviruses may encode a dUTPase is to my knowledge new, and needs to be accommodated in a view of the enzyme's possible value to the virus. Lastly, the amino acid sequence similarities observed provide a basis for investigation of the structure and function of dUTPases; aspects of this area have been touched on in the model for herpesvirus dUTPase structure, and in the suggestion that Motif 3 sequences are a part of the active site, with analogous structures existing in other classes of phosphotransferase. The remainder of this Discussion treats two of these four general areas, namely the evolutionary origins of the genes and the functional implications of dUTPase to the virus systems. While the dUTPase activity of pseudoproteases is hypothetical at present, the homologous relationship of the pseudoprotease and dUTPase genes is a firmly established observation. Since examples of this gene family have been observed in three groups of eukaryotic viruses and in a bacterium, the family is evidently widespread in nature and thus ancient. It is probable that dUTPase encoded by eukaryotic cellular genomes will be found also to belong to this family. Inasmuch as the three herpesvirus dUTPase genes are distinct from other members of the family, having most likely undergone an internal gene duplication, the herpesviruses must have possessed the gene from a remote epoch preceding divergence of HSV-1, VZV and EBV. Since the poxvirus genes show high sequence similarity to each other, they probably represent corresponding genome segments of the two viruses, and thus it seems likely that the gene has been present in the poxvirus lineage since before divergence of orf and vaccinia viruses. 4110 Nucleic Acids Research, Vol. 18, No. 14 The situation with the retroviruses is different. Here only some oncoviruses and some lentiviruses possess a member of this gene family, and it is found in two genomic locations. These facts suggest strongly that it was acquired in two separate events late in the evolution of the major types of retroviruses. In both instances, transfer from the cellular genome must stand as the most likely mechanism; there is no reason to invoke a transfer from one retrovirus to another as a primary possibility. Capture of genes from cellular genomes is, of course, a well known occurrence in retrovirus biology. This view differs greatly from that presented by McClure et al. (1) (see Introduction). The core of their scheme was that the pseudoprotease gene arose in the oncovirus lineage by duplication of the protease gene. With the greater information now available, this proposal can be seen clearly to be unsupportable. Regarding possible relationships between the aspartyl protease family and the pseudoprotease (plus dUTPase) family, I do not consider that there is at present any real evidence to sustain such a connection. The alignment of pseudoprotease and protease amino acid sequences given by McClure et al. (1) involves extensive introductions of sequence gaps and yields only minimal identity or similarity of aligned residues; it is much weaker than the clear alignment between pseudoproteases and dUTPases. However, because similarity in the three-dimensional structures of divergent proteins may be maintained beyond any recognizable sequence similarity, there is no clear lower boundary for alignments of amino acid sequences which would separate related and unrelated sequences. Database searches using the sensitive profile method (18) with profiles from the pseudoprotease and dUTPase sequences do not pull out aspartyl proteases, and vice versa (details not shown). Turning to the role of dUTPase in virus infected cells: deoxyuridine phosphates are present in cells as precursors of TTP. The accepted function of dUTPase is to keep the dUTP concentration at such a low level that incorporation of dU into DNA is minimised (12). Such incorporation should in itself have no aberrant functional effect or direct mutagenic implications. However, dU residues in DNA also arise by non-enzymic deamination of dC residues in DNA; this process is potentially mutagenic, and dU residues are therefore the targets of a repair process, involving excision of uracil, cutting of the DNA backbone at the resulting apyrimidinic site, resection, filling in and ligation. dU incorporated into DNA from dUTP will also invoke this repair process, which must be relatively hazardous per se since it involves the transient local destruction of one strand of the DNA duplex (consider the possible effect of two dU residues incorporated nearby in each strand of a DNA molecule). Inasmuch as poxviruses have large DNA genomes, replicate in the cytoplasm and specify many other enzymes of nucleotide metabolism, it is eminently reasonable that they should encode a dUTPase. In the case of retroviruses, it seems reasonable enough that they might encode their own dUTPase to supplement the cellular enzyme, as do the herpesviruses. An additional possibility is that, since it is specified as part of the gag or pol polyproteins (which are processed into internal components of the virion), the enzyme might well be carried by the virion, and so perhaps could have a role in close association with genomic RNA and reverse transcriptase. What is less clear is why these coding sequences should be present in the genomes of only some retroviruses. This could be rationalised by proposing that only with some variants of viral replication dynamics or in some cellular environments does dUTP incorporation become a significant factor in retrovirus viability. ACKNOWLEDGEMENTS Thanks are due to L. Pearl for discussion, to P. Sharp for discussion and running tree-building programs, to J. SubakSharpe and N. Stow for critical reading of the paper and to L. Kattenhorn for help in preparing the text. REFERENCES 1. McClure, M.A., Johnson, M.S. and Doolittle, R.F. (1987) Proc. Nat. Acad. Sci. USA, 84, 2693-2697. 2. Mercer, A.A., Fraser, K.M., Stockwell, P.A. and Robinson, A.J. (1989) Virology, 172, 665-668. 3. Slabaugh, M.B. and Roseman, N.A. (1989) Proc. Nat. Acad. Sci. USA, 86, 4152-4155. 4. McGeoch, D.J., Dalrymple, M.A., Davison, A.J., Dolan, A., Frame, M.C., McNab, D., Perry, L.J., Scott, J.E. and Taylor, P. (1988) J. Gen. Virol., 69, 1531-1574. 5. Davison, A.J. and Scott, J.E. (1986) J. Gen. Virol., 67, 1759-1816. 6. Baer, R., Bankier, A.T., Biggin, M.D., Deininger, P.L., Farrell, P.J., Gibson, T.J., Hatfull, G., Hudson, G.S., Satchwell, S.C., Seguin, C , Tuffnell, P.S. and Barrell, B.G. (1984) Nature, 310, 207-211. 7. Preston, V.G. and Fisher, F.B. (1984) Virology, 138, 58-68. 8. Fisher, F.B. and Preston, V.G. (1986) Virology, 148, 190-197. 9. Lundberg, L.G., Thoresson, H-O., Karlstrom, O.H. and Nyman, P.O. (1983) EMBOJ., 2, 967-971. 10. Navia, M.A., Fitzgerald, P.M.D., McKeever, B.M., Leu, C.-L., Heimbach, J.C., Herber, W.K., Sigal, I.S., Darke, P.L. and Springer, J.P. (1989) Nature, 337, 615-620. 11. Lapatto, R., Blundell, T., Hemmings, A., Overington, J., Wilderspin, A., Wood, S., Merson, J.R., Whittle, P.J., Danley, D.E., Geoghegan, K.F., Hawrylik, S.J., Lee, S.E., Scheld, K.G. and Hobart, P.M. (1989) Nature, 342, 299-302. 12. Shlomai, J. and Kornberg, A. (1978) J. Biol. Chem., 253, 3305-3312. 13. Caradonna, S.J. and Adamkiewicz, D.M. (1984) J. Biol. Chem., 259, 5459-5464. 14. Evans, P.R. and Hudson, P.J. (1979) Nature, 279, 500-504. 15. Shirakihara, Y. and Evans, P.R. (1988). J. Mol. Biol., 204, 973-994. 16. Devereux, J., Haeberli, P. and Smithies, O. (1984) Nucleic Acids Res,. 57, 1023-1036. 17. Pearson, W.R. and Lipman, D.J. (1988) Proc. Natl. Acad. Sci. USA, 85, 2444-2448. 18. Gribskov, M., McLachlan, A.D. and Eisenberg, D. (1987) Proc. Natl. Acad. Sci. USA, 84, 4355-4358. 19. Argos, P. (1987)7. Mol. Biol., 193, 385-396.