Download Protein sequence comparisons show that the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Restriction enzyme wikipedia , lookup

Expression vector wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Biochemistry wikipedia , lookup

Genomic library wikipedia , lookup

Gene desert wikipedia , lookup

RNA-Seq wikipedia , lookup

Community fingerprinting wikipedia , lookup

Transposable element wikipedia , lookup

Molecular ecology wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Multilocus sequence typing wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Genetic code wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Homology modeling wikipedia , lookup

Non-coding DNA wikipedia , lookup

Point mutation wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transcript
Nucleic Acids Research, Vol. 18, No. 14 4105
© 7990 Oxford University Press
Protein sequence comparisons show that the
'pseudoproteases' encoded by poxviruses and certain
retroviruses belong to the deoxyuridine triphosphatase
family
Duncan J.McGeoch
MRC Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK
Received May 7, 1990; Accepted June 12, 1990
ABSTRACT
Amino acid sequence comparisons show extensive
similarities among the deoxyuridine triphosphatases
(dUTPases) of Escherichia coli and of herpesviruses,
and the 'protease-like' or 'pseudoprotease' sequences
encoded by certain retroviruses in the oncovirus and
lentivirus families and by poxviruses. These
relationships
suggest
strongly
that
the
'pseudoproteases' actually are dUTPases, and have not
arisen by duplication of an oncovirus protease gene as
had been suggested. The herpesvirus dUTPase
sequences differ from the others in that they are longer
(about 370 residues, against around 140) and one
conserved element ('Motif 3') is displaced relative to
its position in the other sequences; a model involving
internal duplication of the herpesvirus gene can
account effectively for these observations. Sequences
closely similar to Motif 3 are also found in
phosphofructokinases, where they form part of the
active site and fructose phosphate binding structure;
thus these sequences may represent a class of
structural element generally involved in phosphate
transfer to and from glycosides.
INTRODUCTION
During a comparative analysis of amino acid sequences encoded
by retroviruses McClure et al. (1) noticed a class of related
sequences of around 140 residues which are specified by some
viruses in the oncovirus and lentivirus groups, but not by all
retroviruses. In the oncoviruses the novel coding sequence is part
of the gag gene, adjacent to protease coding sequences, whereas
in the lentiviruses it is located at a distal position within the pol
gene. The function of this polypeptide was unknown. However,
on the basis of a low level similarity with the retroviral proteases,
it was proposed that the unknown gene had evolved by duplication
of an oncovirus protease gene and subsequent divergence. The
polypeptides were then termed 'protease-like' domains (1) and
later 'pseudoproteases' (2); the latter term is used in this paper,
as a convenient label only. A model was proposed by which the
pseudoprotease coding sequence could have been transferred from
the oncovirus lineage to the lentivirus lineage (1).
Subsequently, clearly related genes were discovered in two
poxviruses, namely vaccinia virus and orf virus (3,2). The
poxvirus genes each consist of an independent open reading frame
with appropriate transcriptional control signals, and the vaccinia
virus gene was shown to be transcribed early in infection (3).
I have now found that the amino acid sequences of
pseudoproteases are characteristically similar to those of
deoxyuridine triphosphatase (dUTPase) enzymes encoded by
herpesviruses and also by Escherichia coli; this discovery was
made as part of a programme pursuing herpesvirus gene functions
and evolutionary relationships. In this paper I describe the
sequence relationships among pseudoproteases and dUTPases,
and outline some implications of these findings.
METHODS
Amino acid sequence data were examined using the GCG
program set (16) running under VAX/VMS. Several other
sequence comparison programs were also used, including those
of Pearson & Lipman (17), Gribskov et al. (18) and Argos (19).
Database searches used Swissprot release 13.
RESULTS
Amino acid sequence comparisons with pseudoproteases and
dUTPases
Amino acid sequences inferred from the gene sequences are
known for three herpesviral dUTPases, from herpes simplex virus
type 1 (HSV-1; ref. 4) varicella-zoster virus (VZV; ref. 5) and
Epstein-Barr virus (EBV; ref. 6). The herpesviral dUTPase genes
are HSV-1 UL50, VZV gene 8 and EBV BLLF3 (residues 88474
to 87641 in the DNA sequence: originally named BLLF2; ref.
6). The functions of the VZV and EBV proteins were assumed
from comparison with the HSV-1 sequence whose function had
been established by biochemical and genetic analyses (7,8). The
EBV sequence exhibits a large internal deletion relative to the
other two and is also divergent from the other sequences. These
aspects lessen its usefulness for sequence comparisons, and it
is dealt with only at a later point in this paper. The sequences
of HSV-1, VZV and EBV dUTPases contain 371, 396 and 278
amino acids respectively. The only other dUTPase sequence
4106 Nucleic Acids Research, Vol. 18, No. 14
SRVl
MMTV
. .SLWGGQLCSSQQKQPISKLTRATPGSAGLDLSST.SHTVLTPEMGPQALSTGIYGPLPPNTFG. . L I L G R S S I T I K . GLQVYP . .GVIDNDYTGEIK
. .GVKGSGLNPEAPFFPIHDLPRGTPGSAGLDLSSQ.KDLILSLEDGVSLVPTLVKGTLPEGTTG. . L I I G R S S N Y K K . G L E V L P . .GVIDSDFQGEIK
Visna
EIAV
Orf
Vaccinia
SEIFLAKEGRGILQKRAEDAGYDLIC.
EEIMLAYQGTQIKEKRDEDAGFDLCV.
MEFCHTETLQWRLSQNATIPARGSPGAAGLDLCS.
MNINSPVRFVKETNRAKSPTRQSPYAAGYDLYS.
HSV1
VZV
. .ELTPVQTEHGDGVREAIAFLPKREEDAGFDIWR.RPVTVPANG.TTWQPSLRMLHADAGPAACYVLGRSSLNAR.GLLWP. . TRWLPGHVCAF .
. .HRDSAEYHIDVPLTYKHIINPKRQEDAGYDICVP. YNLYLKRNEFIKIVLPIIRDWDLQHPSINAYIFGRSSKSRS.GIIVCP. . TAWPAGEHCKF .
E. coli
. PQEISIPAGQVKRIAIDLKINLKKDQWA.
. PYDIMIPVSDTKIIPTDVKIQVPPNSFG.
.AYDCVIPSHCSRWFTDLLIKPPSGCYG.
.AYDYTIFPGERQLIKTDISMSMPKGCYG.
.MIGTKSSFANK.GVFVQG.
.WVTGKSSMAKQ . GLLING.
.RIAPRSG.AVKHFIDVGA.
.RIAPRSGLSLK.GIDIGG.
.GIIDSGYQGTIQ
.GIIDEGYTGEIQ
.GVIDEDYRGNVG
.GVIDEDYRGNIG
MKKIDVKILDPRVGKEFPLPTYATSGSAGLDLRACLNDAVELAPGDTTLVPTGLAIHIADPSLAA.MMLPRSGLGHKHGIVLGNLVGLIDSDYQGQLM
Onco Con
L e n t i Con
Pox Con
4 / 6 Con
5 / 6 Con
6 / 6 Con
M o t i f 1 =====
Motif 2 ====
G—L
PI — L-R-TPGSAGLDLSS
L—E-G
T
G-LP—T-G—LI-GRSS
EI-LA—G— I—KR-EDAG-DL
P— I-IP
K-I—D-KI
KSS-A
V
A—P-R-SP-AAG-DL-S—AYD--I
TD
P-GCYG—RIAPRSG
R—P— AG-DL-S
IP
TD
P
G
1—RSS
AG-DL
T
P
G
1
S
AG-DL
S
H e r p e s Con
PKR-EDAG-DI-V
N
V-P--R
Y—GRSS
M o t i f 3 =========
K-GL-V-P—GVID-D—GEIK
G
G—GIID-GY-G-IQ
K--ID-G
GVIDEDYRGN-G
K-G--V
GVID-DY-G-IK-G
G-ID—Y-G-IG-ID
G
G--V-P--T-W—G--C-F-
SRV1
MMTV
IMAKAVNN. IVTVPQGNRIAQLILL
VMVKAAKN.AVIIHKGERIAQLLLL
P L I . . . .ETDNKVQQPYRGQGSFGS . SDIYW. .
PYL
KLPNPVIKEERGSEGFGSPSHVHW. .
Visna
EIAV
Orf
Vaccinia
WIYNSNNKEWIPQGRKFAQLILM
VICTNIGKSNIKLIEGQKFAQLIIL
WLFNFGNSDFEVKKGDRIAQLICE
VILINNGKCTFNVNTGDRIAQLIYQ
PLIHEELEPWGETRKTERGEQGFGS . TGMYW. .
QHHSNSRQPWDENKISQRGDKGFGS . TGVFW. .
RISCPAVQEVNCLDNTDRGDSGFGS . TGSGA. .
RIYYPELEEVQSLDSTNRGDQGFGS.TGLR. . .
HSV1
VZV
. WYNLTGVPVTLEAGAKVAQLLVAGADALPWIPPDNFHGTKALRNYPRGVPDSTAEPRNPPLLVFTNEFDAEAPPSERGTGGFGS . TGI
.YVYNLTGDDIRIKTGDRLAQVLLIDHNTQIHL.KHNVLSNIAFPYAIRGKCGIPG
VQWYFTKTLDLIATPSERGTRGFGS . TDKET . .
E. c o l i
ISVWNRGQDSFTIQPGERIAQMIFV
Onco Con
Lenti Con
Pox Con
4/6 Con
5/6 Con
6/6 Con
Motif 4 =======
-M-KA—N—V
G-RIAQL-LL
V
N
G-KFAQLI
V-L-N-G
F-V—GDRIAQLI
V
N—N
G-RIAQLI
V
G AQLI
G AQL
Herpes Con—VYNLTG
G
AQ-L
PWQAEFNLVEDFDATDRGEGGFGH. SGRQ
P
RI—P
N
A
RG
FT
Motif 5 ========
N-V
RG FGS-S W
PW-E
RG—GFGS-TG—W
EV—LD-T-RGD-GFGS-TG
RG—GFGS-TG—W
RG—GFGS
RG FGS
D—A-PSERGT-GFGS-T
Figure 1. Alignments of amino acid sequences of pseudoproteases and dUTPases. The sequences shown for SRVl, MMTV, visna virus and EIAV, are for the
pseudoprotease domain as defined by McClure et al. (1). See refs 1 and 2 for original retrovirus sequence references. The orf virus, vaccinia virus and E. coli
sequences are shown starting with their translational initiators. The HSV-1 sequence is for residues 193 to the C-terminus at 371. The VZV sequence is for residues
212—385. Internal padding characters are indicated by dots. Leading and trailing dots indicate that the protein sequence extends further than shown. The locations
of five conserved motifs in the retrovirus plus poxvirus sequences are indicated by double bars, and corresponding regions in the E. coli sequence and the herpesvirus
consensus are marked by single over-lines.
known is for the enzyme of £. coli; this contains 151 amino acids
(9).
I found that certain amino acid motifs conserved among
pseudoproteases are present in herpesvirus dUTPases, mostly in
the C-terminal halves of their sequences. Subsequently, I realized
that the E. coli sequence is also similar. These relationships are
illustrated by the sequence alignments shown in Fig. 1. This
contains sequences from two oncoviruses [simian retrovirus 1
(SRVl) and mouse mammary tumour virus (MMTV)], two
lentiviruses [visna virus and equine infectious anaemia virus
(EIAV)], two poxviruses (orf virus and vaccinia virus), two
herpesviruses (HSV-1 and VZV), and from E. coli. Four other
retrovirus sequences are known which display pseudoprotease
domains (1,2): the selection used here was chosen to give a
manageable amount of data while displaying a degree of
divergence suitable to highlight conserved sequence elements.
Fig. 1 was constructed by first making many pairwise
alignments of sequences using the Bestfit program. The overall
alignment shown was then produced 'by hand' using the pairwise
alignments as guides. Introduction of gaps in sequences was kept
to a minimum, and wherever possible gaps are presented in
register across the set of sequences. The variability in the
sequences is such that this strategy loses some optimality in
individual pairwise comparisons. However, I consider that it gives
a valid overall view, and the result is certainly adequate to
illustrate the major sequence similarities of the set. The gaps
introduced are all small, with one exception: near the end of the
aligned set a gap equivalent to 36 residues was introduced into
all the sequences except those of the two herpesviruses. This is
justified by the occurrence in flanking positions of compelling
sequence similarities. In addition, this region varies in length
between HSV-1 and VZV, and also to some extent in other cases,
which suggests that it represents a structural feature not subject
to stringent restrictions on chain length.
In order to draw out the conserved aspects of the sequences
and at the same time give information on their degree of
conservation, a number of consensus sequences are presented.
These include separate consensi for the oncoviruses, the
lentiviruses and the poxviruses, and consensus sequences
representing different degrees of conservation among all six of
these sequences. There are five major local conserved regions
in these six pseudoprotease sequences, and these are labelled in
Nucleic Acids Research, Vol. 18, No. 14 4107
Table 1. Comparisons of conserved positions between pseudoproteinases and dUTPases
Onco
Lenti
Pox
Herpes
E. coli
Onco
Lenti
Pox
Herpes
E. coli
_
36.5
39.5
28.5
39.0
36.5
38.5
34.0
32.0
39.5
38.5
25.75
40.0
28.5
34.0
25.75
26.0
39.0
32.0
40.0
26.0
-
Mean
35.87
35.25
35.94
28.56
34.25
For each pair of sequences in Fig. 1, the positions at which identical residues occurred were
summed, omitting positions at which any padding character was added. This gave a total
of 128 positions considered. Scores averaged for related viruses were then computed, and
are presented.
Fig. 1 as Motifs 1—5. Motif 1 is a region which McClure et
al. (1) considered homologous to the aspartate protease catalytic
site sequence Asp-Thr-Gly or Asp-Ser-Gly. Motif 2 is a poorly
conserved element, which gains in visibility when comparisons
are made with the E. coli and herpesviral dUTPase sequences
(see below).
When the pseudoprotease sequences are compared with E. coli
dUTPase, it can be seen that all of the motifs are present in the
E. coli sequence. E. coli Motif 1 is most similar to the oncovirus
consensus, while E. coli Motif 2 is identical to the poxvirus
version. Outside the motif regions the similarities between E.
coli dUTPase and the pseudoprotease sequences are less
pronounced, but there are many local identities with one or more
of the other sequences and alignments of similar amino acid types.
In addition, the overall length of E. coli dUTPase is closely
similar to those of the pseudoproteases. The two poxvirus genes
each have their own translational initiation and termination sites,
which the E. coli positions match quite closely.
On comparing the HSV-1 and VZV sequences with all the
others, convincing counterparts are seen for Motifs 1, 2, 4 and
5. The herpes virus Motif 1 is particularly close to the lenti virus
version, and the herpesvirus Motif 2 is identical to the oncovirus
version. These motif regions represent the majority of the
sequences most conserved between HSV-1 and VZV, and outside
them the herpesvirus sequences show much lower similarity to
the non-herpesvirus sequences. However, in the region
corresponding to Motif 3 the herpesviral sequences are not similar
to the others. Thus, while the C-terminal regions of the HSV-1
and VZV dUTPases are convincingly related overall to the whole
pseudoprotease domain and to E. coli dUTPase, they lack one
major conserved element present in all of the non-herpesvirus
sequences.
Relationships between the sequences in Fig. 1 were also
evaluated by computing for each aligned pair the number of
identical residues seen at corresponding positions, excluding all
positions at which a padding character had been inserted in any
sequence. The four most similar pairs were: SRV1 and MMTV
(score of 65 out of 128); visna virus and EIAV (score of 56);
vaccinia and orf viruses (score of 73); and HSV-1 and VZV
(score of 54). All these pairs are of known related viruses. To
examine other relationships, data for each of these pairs were
averaged, as shown in Table 1. For each of the five groups so
defined (oncovirus, lentivirus, poxvirus, herpesvirus and E. coli),
means were also calculated for comparisons with all of the other
groups. These data indicate that the oncovirus, lentivirus,
poxvirus and E. coli sequences are approximately equally related
to each other, while the herpesvirus sequences are rather distinct
from the others. The lower scores for herpesviruses can mostly
be accounted for by their lack of Motif 3. Similar conclusions
on relatedness were taken from exercises in constructing similarity
trees (not shown).
This set of sequence comparisons was completed by an
unexpected finding: the absent herpesvirus Motif 3 is present in
the N-terminal halves of the herpesvirus dUTPase sequences. The
N-terminal portions of the HSV-1, VZV and EBV dUTPases,
of some 200 residues, show little overall sequence similarity to
each other, with only one convincingly conserved region. An
alignment of this region and its surroundings is presented in Fig.
2. As shown in the figure, this conserved region is very closely
similar to Motif 3 of the retroviral, poxviral and E. coli
sequences, in both the invariant residues and the types of amino
acids present at positions which are not completely conserved.
The similarities of the pseudoproteases with E. coli dUTPase
in the first instance, and secondarily with the herpesviral
dUTPases, are compelling. They result in the clear conclusion
that the dUTPase and pseudoprotease genes are evolutionarily
related, and hence in the proposal that the pseudoproteases may
well be dUTPases. This raises questions concerning the structures
of the herpesvirus dUTPases and the functionality of the
conserved motifs, which are pursued in the following sections.
More general issues arising are dealt with in the Discussion.
A model for the structure of herpesviral dUTPase
Fig. 3A summarizes the arrangements of all the major similar
motifs found. To understand the relationship of the herpesviral
enzymes to the other sequences it is necessary to account for the
observed difference in ordering of conserved elements of
polypeptide sequence. In Fig. 3B I present a model which does
this with economy. First, suppose that the active form of the E.
coli type of dUTPase is a dimeric molecule, and that the active
site (or some other essential functional structure) is composed
of sequences contributed by both subunits, including residues of
the conserved motifs; there are thus two active sites per dimer.
In the particular version shown in Fig. 3B, each active site
contains Motif 3 from one subunit, and Motifs 1,2,4 and 5 from
the other. Next, suppose that the herpesvirus dUTPase represents
the product of an intragenic duplication, so that the active enzyme
molecule is a monomeric polypeptide chain containing the
equivalent of both chains in the E. coli dimer. During evolution
one of the active sites is then lost, leaving one active site per
large monomer: this loss is equivalent to mutational destruction
of Motifs 1, 2, 4 and 5 in the N-terminal half of the chain, and
of Motif 3 in the C-terminal half; this is the situation observed.
This model was inspired by the example of the genuine
4108 Nucleic Acids Research, Vol. 18, No. 14
HSV1
vzv
EBV
80
106
59
HAPALASPGHHVIL . GLIDSGYRGTVMAVWAPKR. TRE
KDTALADEDNFFVANGVIDAGYRGVISALLYYRPGVT.V
MLWGSTSRPVTSHV.GIIDPGYTGELRLILQNQRRYNST
Herpes Con 2/3
Herpes Con 3/3
E. coli
Onco Con
Lenti Con
Pox Con
4/6 Con
6/6 Con
ALAS
71
G-ID-GYRG
G-ID-GY-G
A-L
R-T—
SGLGHKHGIVLGNLVGLIDSDYQGQLMISVWNRGQDSFT
SS
K-GL-V-P—GVID-D—GEIK-M-KA—N—VSS-A
G
G—GIID-GY-G-IQV
N
SG
K—ID-G
GVIDEDYRGN-GV-L-N-G
FSS
K-G--V
GVID-DY-G-I-V
N---N
S
G_ID
G
Motif 3
Figure 2. Location of Motif 3 in the N-terminal region of herpesvirus dUTPases.
Sequences extracted from the N-terminal regions of the herpesviral dUTPases
are shown aligned, with starting residue numbers indicated, around the counterpart
of the Motif 3 of Fig. 1. The Motif 3 sequences and their surroundings for E.
coli dUTPase and pseudoprotease consensus sequences (from Fig. 1) are presented
for comparison.
N|1
2
3
4
retroviral proteases and other aspartyl proteases: the retroviral
enzymes are active as dimers, whereas some of their homologues
from other sources are double-length monomers whose genes
have been internally duplicated (10,11). I consider that there is
a lack of significant evidence for the common evolutionary origin
of proteases and pseudoproteases as proposed by McClure et al.
(1) (see Discussion), so I regard the aspartyl protease structures
as providing a valuable paradigm but not direct evidence in
support of the dUTPase model.
Some indirect evidence is available in support of the model.
It is known that E. coli dUTPase is actually a tetramer, which
is consistent with the model (12). Caradonna and Adamkiewicz
(13) showed that the HSV-1 enzyme is monomeric, and in the
same paper reported that dUTPase from HeLa cells is a dimer,
with the monomer having an estimated Mr of 22,500 (the HeLa
protein's sequence is not known).
Direct evidence would require the demonstration that sequence
5 |C
T]c
B
Figure 3. Arrangement of motifs and model for dUTPase quaternary structure. A. The linear arrangement of motifs in the E. coli and pseudoprotease sequences
is indicated on the left, and the herpesvirus arrangement on the right. B. The left cartoon presents a model for the E. coli type of dUTPase. The active enzyme
is shown as a dimer with two active sites, each composed of Motifs 1, 2, 4 and 5 from one monomer, and Motif 4 from the other. The right cartoon represents
a herpesvirus dUTPase monomer, with folding corresponding to the E. coli dimer; the N-terminal region is shaded.
Motif 1 =====
AG-DL
AG_DL
5/9 Con
7/9 Con
SMRV
HERV
V-T
P
Motif 2 ====
G
I-GRSS
Motif 3 =========
K-G—V
G-ID-DY-G-IG
G _ I D G ___
RS
PPANPCPPSNQPRRYVTDLWRATAGSAGLDLCTT.TDTILTTQNSPLTLPVGIYGPLPPQTFG. . LILAEPALPSK.GIQVXP . .GILDNDFEGEIH
TPTVPSVSGNKPVTTIQQLSLTTSGSAAVDLCTI .QAVSLLPGEPPQKIPTGVYGPLPEGTVG. . LILGRSlLNLK.GVQIHT. . SWDSDYKGEIQ
RNMGTNFRKAIKRKRFPRNLRNGLACRSD.FLLMPQMNV. .QPVPVHSPGPLPPATIG. . LILGRGSLTLQ.GLIIYP . .GTVDPYHKEEIQ.
. .AFRYATPQMEEDKGPINHPQYPGDVGLDVSLP.KDLALFPHQT.VSVTLTVPPPSIPHHRP. . TIFGRSGLAMQ. GILVKP . .CRWRRGGVDVS .
Motif
4 =======
M o t i f
5
========
5/9 Con
7/9 Con
V—N
SMRV
IILSTTKD. .VTIPKGTRLAQIVIL.
PLQ. . . . QINSNFHKPYRGASAPGS . SDVYW. .
HERV
LVISSSIP. -WSASPRDRIAQLLLL
pyi
IAPH
VLCSSPRG.VFSIKQGDRIAQLVL
PPS...LGDGETYTLQKRAMGSSGSDSAYL. . .
N
G-RIAQLI
p
G—AQL
RG-GFGS-TG
RG—GFGS
KGGNSEIKRIGGLVSTDP . TGKAA. .
. LTNFSDQ. TVFLNKYRRFCQLVYLHKHHLTSFYSPHSDAGVLGPRSLFRWASCTFEE . . . VPSLAMGDSGLSEALEGRQGRGFGS . SGQ
Figure 4. Arrangement of motifs in variant pseudoproteases and EBV dUTPase. Consensus sequences derived from all the sequences in Fig. 1 including HSV-1
and VZV are shown aligned with sequences from squirrel monkey retrovirus (SMRV), human endogenous retrovirus (HERV), intracisternal A particle of hamsters
(IAPH), and with residues 108-278 of EBV dUTPase. See refs 1 and 2 for retrovirus sequence references. Sequences corresponding to the consensus motifs are overlined.
Nucleic Acids Research, Vol. 18, No. 14 4109
or structural similarities exist between the N-terminal and Cterminal halves of the herpesvirus dUTPases. I have not been
able to detect any convincing overall sequence similarity. This
is not surprising, however, when it is considered that since the
HSV-1 and VZV lineages diverged their dUTPase genes have
mutated to the point that in the present day amino acid sequences
of the N-terminal halves little more than Motif 3 is conserved.
I pursued this examination further by comparing in the various
sequences the surroundings of Motif 3, in terms of
hydrophobicities, predicted probabilities of surface occurrence
and predicted secondary structures. General similarities can be
discerned between the herpesvirus sequences and the others,
extending at least to 30 or 40 residues on each side of the motif;
however, I do not consider that such observations provide critical
evidence (data not shown).
A possible sugar phosphate binding element
As was noted above, four retroviral pseudoprotease sequences,
and also the EBV dUTPase sequence, were not included in Fig.
1. One of those omitted was for Mason-Pfizer monkey virus,
which is almost identical to the SRV1 sequence. The other three
retroviral sequences and the EBV sequence are aligned in Fig.
4 with overall consensus sequences derived from Fig. 1. It can
be seen that in each case certain of the previously conserved
motifs are significantly altered, although all four sequences are
nonetheless clearly related to the sequences listed in Fig. 1. Thus,
it is to be assumed that the pseudoprotease and dUTPase
sequences known at present do not delimit possible variability
in Motifs 1 to 5 in this polypeptide family.
Extensive searches were made in the Swissprot library (release
13) for protein sequences and for motifs within sequences which
might be related to the dUTPase/pseudoprotease family of
sequences. These used as probes both complete sequences and
individual motif sequences. No proteins emerged as convincing
additional members of the family.
Searches with individual motifs (and variants of motifs) did
not yield anything of visible interest for Motifs 1, 2, 4 and 5.
However, an intriguing correlation was found for Motif 3: many
of the sequences most similar to this are in library entries for
enzymes involved in phosphate transfer to and from glycosides-a
category which also includes dUTPase. The most compelling
example was for five prokaryotic and eukaryotic phosphofructokinases, as shown in Fig. 5. Crystallographic structures have been
determined for the Bacillus stearothermophilus and E. coli
(isozyme 1) phosphofructokinases (14,15). In both these cases
the analogue of Motif 3 forms a loop on the protein surface and
comprises part of the active site. The aspartate residue equivalent
to position 4 in the motif (i.e. the only completely invariant
residue in the motif; see Fig. 5) is involved in hydrogen bonding
to fructose ring hydroxyl groups, and the aspartate equivalent
to position 6 is involved in hydrogen bonding water molecules
associated with a phosphate-bound Mg 2+ ion. I suggest that
Motif 3 may represent a class of functionally related structures
commonly employed in glycoside binding and phosphate transfer.
In the case of HSV-1 dUTPase, it is known that the Motif 3
locality is functionally important, since its disruption by a small
in-frame insertion gives an enzymatically inactive protein (ref.
7; V. G. Preston, personal communication).
DISCUSSION
The primary finding of this study is that the 'pseudoprotease'
sequences of retroviruses and poxviruses show extensive
dUTPases
HSV1
VZV
EBV
E. coli
13 5 7 9
GLIDSGYRG
GVIDAGYRG
GIIDPGYTG
GLIDSDYQG
Pseudoproteases
SRV1
MMTV
Visna
EIAV
Pox (2)
SMRV
HERV
IAPH
GVIDNDYTG
GVIDSDFQG
GIIDSGYQG
GIIDEGYTG
GVIDEDYRG
GILDNDFEG
SWDSDYKG
GIVDPYHKE
Phosphofructokinases
E. coli
B. stear.
Mammals (3)
Consensus
GTIDNDIKG
GTIDNDIPG
GSIDNDFCG
G-IDND—G
13 5 7 9
Figure 5. Comparison of Motif 3 with active site sequences of
phosphofructokinases. Motif 3 sequences from Figs 1, 2 and 4 are aligned with
sequences from five phosphofructokinases (extracted from Swissprot release 13).
similarity to sequences of known dUTPases. This has implications
at four levels. Firstly, it demonstrates that the pseudoprotease
and dUTPase genes have a common origin; ideas on the evolution
of pseudoprotease coding sequences must take account of this.
Secondly, it suggests strongly that the pseudoprotease
polypeptides are actually dUTPases; this prediction is open to
experimental analysis. (A more circumspect prediction would be
that the pseudoproteases are either dUTPases or have a related
function such as some other phosphotransferase activity; given
that no examples of the latter possibility have emerged from
database searches, it seems rather unlikely.) Thirdly, the idea
that poxviruses and some retroviruses may encode a dUTPase
is to my knowledge new, and needs to be accommodated in a
view of the enzyme's possible value to the virus. Lastly, the
amino acid sequence similarities observed provide a basis for
investigation of the structure and function of dUTPases; aspects
of this area have been touched on in the model for herpesvirus
dUTPase structure, and in the suggestion that Motif 3 sequences
are a part of the active site, with analogous structures existing
in other classes of phosphotransferase. The remainder of this
Discussion treats two of these four general areas, namely the
evolutionary origins of the genes and the functional implications
of dUTPase to the virus systems.
While the dUTPase activity of pseudoproteases is hypothetical
at present, the homologous relationship of the pseudoprotease
and dUTPase genes is a firmly established observation. Since
examples of this gene family have been observed in three groups
of eukaryotic viruses and in a bacterium, the family is evidently
widespread in nature and thus ancient. It is probable that dUTPase
encoded by eukaryotic cellular genomes will be found also to
belong to this family. Inasmuch as the three herpesvirus dUTPase
genes are distinct from other members of the family, having most
likely undergone an internal gene duplication, the herpesviruses
must have possessed the gene from a remote epoch preceding
divergence of HSV-1, VZV and EBV. Since the poxvirus genes
show high sequence similarity to each other, they probably
represent corresponding genome segments of the two viruses,
and thus it seems likely that the gene has been present in the
poxvirus lineage since before divergence of orf and vaccinia
viruses.
4110 Nucleic Acids Research, Vol. 18, No. 14
The situation with the retroviruses is different. Here only some
oncoviruses and some lentiviruses possess a member of this gene
family, and it is found in two genomic locations. These facts
suggest strongly that it was acquired in two separate events late
in the evolution of the major types of retroviruses. In both
instances, transfer from the cellular genome must stand as the
most likely mechanism; there is no reason to invoke a transfer
from one retrovirus to another as a primary possibility. Capture
of genes from cellular genomes is, of course, a well known
occurrence in retrovirus biology.
This view differs greatly from that presented by McClure et
al. (1) (see Introduction). The core of their scheme was that the
pseudoprotease gene arose in the oncovirus lineage by duplication
of the protease gene. With the greater information now available,
this proposal can be seen clearly to be unsupportable. Regarding
possible relationships between the aspartyl protease family and
the pseudoprotease (plus dUTPase) family, I do not consider that
there is at present any real evidence to sustain such a connection.
The alignment of pseudoprotease and protease amino acid
sequences given by McClure et al. (1) involves extensive
introductions of sequence gaps and yields only minimal identity
or similarity of aligned residues; it is much weaker than the clear
alignment between pseudoproteases and dUTPases. However,
because similarity in the three-dimensional structures of divergent
proteins may be maintained beyond any recognizable sequence
similarity, there is no clear lower boundary for alignments of
amino acid sequences which would separate related and unrelated
sequences. Database searches using the sensitive profile method
(18) with profiles from the pseudoprotease and dUTPase
sequences do not pull out aspartyl proteases, and vice versa
(details not shown).
Turning to the role of dUTPase in virus infected cells:
deoxyuridine phosphates are present in cells as precursors of TTP.
The accepted function of dUTPase is to keep the dUTP
concentration at such a low level that incorporation of dU into
DNA is minimised (12). Such incorporation should in itself have
no aberrant functional effect or direct mutagenic implications.
However, dU residues in DNA also arise by non-enzymic
deamination of dC residues in DNA; this process is potentially
mutagenic, and dU residues are therefore the targets of a repair
process, involving excision of uracil, cutting of the DNA
backbone at the resulting apyrimidinic site, resection, filling in
and ligation. dU incorporated into DNA from dUTP will also
invoke this repair process, which must be relatively hazardous
per se since it involves the transient local destruction of one strand
of the DNA duplex (consider the possible effect of two dU
residues incorporated nearby in each strand of a DNA molecule).
Inasmuch as poxviruses have large DNA genomes, replicate
in the cytoplasm and specify many other enzymes of nucleotide
metabolism, it is eminently reasonable that they should encode
a dUTPase. In the case of retroviruses, it seems reasonable
enough that they might encode their own dUTPase to supplement
the cellular enzyme, as do the herpesviruses. An additional
possibility is that, since it is specified as part of the gag or pol
polyproteins (which are processed into internal components of
the virion), the enzyme might well be carried by the virion, and
so perhaps could have a role in close association with genomic
RNA and reverse transcriptase. What is less clear is why these
coding sequences should be present in the genomes of only some
retroviruses. This could be rationalised by proposing that only
with some variants of viral replication dynamics or in some
cellular environments does dUTP incorporation become a
significant factor in retrovirus viability.
ACKNOWLEDGEMENTS
Thanks are due to L. Pearl for discussion, to P. Sharp for
discussion and running tree-building programs, to J. SubakSharpe and N. Stow for critical reading of the paper and to L.
Kattenhorn for help in preparing the text.
REFERENCES
1. McClure, M.A., Johnson, M.S. and Doolittle, R.F. (1987) Proc. Nat. Acad.
Sci. USA, 84, 2693-2697.
2. Mercer, A.A., Fraser, K.M., Stockwell, P.A. and Robinson, A.J. (1989)
Virology, 172, 665-668.
3. Slabaugh, M.B. and Roseman, N.A. (1989) Proc. Nat. Acad. Sci. USA,
86, 4152-4155.
4. McGeoch, D.J., Dalrymple, M.A., Davison, A.J., Dolan, A., Frame, M.C.,
McNab, D., Perry, L.J., Scott, J.E. and Taylor, P. (1988) J. Gen. Virol.,
69, 1531-1574.
5. Davison, A.J. and Scott, J.E. (1986) J. Gen. Virol., 67, 1759-1816.
6. Baer, R., Bankier, A.T., Biggin, M.D., Deininger, P.L., Farrell, P.J.,
Gibson, T.J., Hatfull, G., Hudson, G.S., Satchwell, S.C., Seguin, C ,
Tuffnell, P.S. and Barrell, B.G. (1984) Nature, 310, 207-211.
7. Preston, V.G. and Fisher, F.B. (1984) Virology, 138, 58-68.
8. Fisher, F.B. and Preston, V.G. (1986) Virology, 148, 190-197.
9. Lundberg, L.G., Thoresson, H-O., Karlstrom, O.H. and Nyman, P.O. (1983)
EMBOJ., 2, 967-971.
10. Navia, M.A., Fitzgerald, P.M.D., McKeever, B.M., Leu, C.-L., Heimbach,
J.C., Herber, W.K., Sigal, I.S., Darke, P.L. and Springer, J.P. (1989)
Nature, 337, 615-620.
11. Lapatto, R., Blundell, T., Hemmings, A., Overington, J., Wilderspin, A.,
Wood, S., Merson, J.R., Whittle, P.J., Danley, D.E., Geoghegan, K.F.,
Hawrylik, S.J., Lee, S.E., Scheld, K.G. and Hobart, P.M. (1989) Nature,
342, 299-302.
12. Shlomai, J. and Kornberg, A. (1978) J. Biol. Chem., 253, 3305-3312.
13. Caradonna, S.J. and Adamkiewicz, D.M. (1984) J. Biol. Chem., 259,
5459-5464.
14. Evans, P.R. and Hudson, P.J. (1979) Nature, 279, 500-504.
15. Shirakihara, Y. and Evans, P.R. (1988). J. Mol. Biol., 204, 973-994.
16. Devereux, J., Haeberli, P. and Smithies, O. (1984) Nucleic Acids Res,. 57,
1023-1036.
17. Pearson, W.R. and Lipman, D.J. (1988) Proc. Natl. Acad. Sci. USA, 85,
2444-2448.
18. Gribskov, M., McLachlan, A.D. and Eisenberg, D. (1987) Proc. Natl. Acad.
Sci. USA, 84, 4355-4358.
19. Argos, P. (1987)7. Mol. Biol., 193, 385-396.