Download Comparative sequence analysis of the long repeat regions and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Frameshift mutation wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

DNA vaccination wikipedia , lookup

Minimal genome wikipedia , lookup

Genome (book) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Genomic library wikipedia , lookup

Transposable element wikipedia , lookup

Gene expression profiling wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genetic code wikipedia , lookup

Primary transcript wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene desert wikipedia , lookup

Microevolution wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Designer baby wikipedia , lookup

Genome evolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human genome wikipedia , lookup

RNA-Seq wikipedia , lookup

Non-coding DNA wikipedia , lookup

Microsatellite wikipedia , lookup

Gene wikipedia , lookup

Sequence alignment wikipedia , lookup

Metagenomics wikipedia , lookup

Genomics wikipedia , lookup

Point mutation wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Genome editing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Journal of General Virology (1991), 72, 3057 3075. Printedin Great Britain
3057
Comparative sequence analysis of the long repeat regions and adjoining
parts of the long unique regions in the genomes of herpes simplex viruses
types 1 and 2
Duncan J. McGeoch*, Charles Cunningham, Graham Mclntyre and Aidan Dolan
M R C Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G l l 5JR, U.K.
We report the determination of the D N A sequence of
the long repeat (RL) region and adjacent parts of the
long unique (UL) region in the genome of herpes
simplex virus type 2 (HSV-2) strain HG52. The D N A
sequences and genetic content of the extremities of
HSV-2 UL were found to be closely similar to those
determined previously for HSV-1. The 5658 bp
sequenced at the left end of HSV-2 UL contained
coding regions for genes UL1 to UL4 plus part of UL5.
The 4355 bp sequenced at the right end Of UL contained
coding regions for part of gene UL53, and the whole of
genes UL54 to UL56. Comparison of the HSV-1 and
HSV-2 UL56 sequences led to a correction in the
published HSV-1 UL56 reading frame. The HSV-2 R L
region, including one copy of the a sequence, was
determined to be 9263 bp, with a base composition of
75-4% G+C and with many repetitive sequence
elements. In HSV-2 RL, sequences were identified
corresponding to HSV-1 genes encoding the immediate early IE110 (ICP0) transcriptional regulator and
the ICP34.5 neurovirulence factor; the former
HSV-2 gene was proposed to contain two introns, and
the latter one intron. Downstream of the HSV-2
immediate early gene, the RE sequence encoding the
latency-associated transcripts (LATs) was found to be
dissimilar to that in HSV-1; the probable L A T
promoter regions, however, showed similarities to
HSV-1. Properties of the LAT sequences in both
HSV-1 and HSV-2 were consistent with LATs being
generated as an intron excised from a longer transcript.
Introduction
tations (see Fig. 1). One well characterized gene was
recognized in R L, encoding the immediate early transcriptional regulatory protein IE 110 or ICP0; this gene is
flanked by substantial sequences the roles of which were
less well defined. Downstream of the IE110 gene is a
region of some 3500 bp which has not been assigned any
protein coding function but which is the major locus of
transcription in neurons latently infected with HSV-1,
giving rise to R N A species termed latency-associated
transcripts (LATs) (Stevens et al., 1987; Rock et al.,
1987; Spivack & Fraser, 1987; Wagner et al., 1988a, b).
The function of these RNAs remains obscure, although it
has been observed that some HSV-1 variants defective in
LAT expression show impaired reactivation from
latency in animal models (Leib et al., 1989; Steiner et al.,
1989). On the other side of the IE110 gene is a region in
which Chou & Roizman (1986) and Ackermann et al.
(1986) have mapped sequences encoding a protein
termed ICP34.5 in HSV-1 strain F, and identified a
candidate open reading frame (ORF) for ICP34.5. In the
HSV-1 strain 17 sequence, however, this ORF was not
conserved and there was no satisfactory alternative
We have previously described the genomic D N A
sequence of herpes simplex virus type 1 (HSV-I) strain
17 (McGeoch et al., 1985, 1986, 1988; Perry & McGeoch,
1988). The 152 kbp sequence was interpreted as
containing 70 distinct genes of which two, located in the
major repeat elements of the genome, were present in
two copies each. Many of the proposed genes had no
significant previous characterization and further studies
will be necessary to authenticate them fully. Nevertheless, we considered the interpretation to be generally
convincing, and in most of the genome to leave little
room for additional genes or other functional sequences.
The major exception to this evaluation was the long
repeat region (RL; Perry & McGeoch, 1988). This 9 kbp
element is present in two copies (TRL and IRL) which
flank the long unique region (UL) in opposing orienThe nucleotidesequencedata reported in this paper will appear in
the EMBL,DDBJ and GenBanknucleotidesequencedatabasesunder
the accessionnumbers D01127 and D01128.
0001-0427 © 1991 SGM
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
3058
D. J. M c G e o c h and others
(Perry & McGeoch, 1988). This conflict has recently
been resolved by revision of both the strain F and strain
17 sequences (Chou & Roizman, 1990; Chou et al., 1990;
A. Dolan, E. McKie, A. R. MacLean & D. J. McGeoch,
unpublished data), adding a further gene to the complement recognized for strain 17. I C P 3 4 . 5 appears to be
an important determinant of viral neurovirulence in
HSV-1 (Chou et al., 1990), and in HSV-2 a neurovirulence determinant maps to an equivalent region of
the genome (Taha et al., 1989a, b, 1990).
The present paper describes the determination of the
D N A sequence for the RL element of HSV-2 strain
HG52, together with adjacent parts of UL. The major
objective of this work was to compare the HSV-1 and
HSV-2 sequences in order to gain better insight into the
functional roles of RL; the HSV-2 sequence analysis also
yielded detailed information on the counterparts of the
I E l l 0 gene, the ICP34.5 gene and nine genes in UL.
Results
Nomenclature
We previously introduced a general nomenclature for
HSV-1 genes in the short unique region (Us) (McGeoch
et al., 1985) and UL (McGeoch et al., 1988) by numbering
genes in each region (US1 to US12, UL1 to UL56); this
freed gene names from reference to proposed functions,
Mrs of encoded proteins and expression characteristics.
We believe that it is now useful to have a corresponding
nomenclature for genes in RL: we refer to the HSV-1
gene encoding I C P 3 4 . 5 (Chou & Roizman, 1990) as
RL1, and to the immediate early gene encoding I E l l 0
(Vmw110, ICP0) as RL2. In this paper we use these
names for HSV-1 genes and their HSV-2 equivalents. I f
the L A T transcription unit (see below) is ever assigned a
protein-coding function, it could be regarded as RL3.
Determination o f the sequences o f H S V - 2 R L and
adjoining regions
Methods
DNA sequence determination. Four plasmid-cloned fragments of
HSV-2 strain HG52 DNA were used for sequence determination:
BamHIf(cloned in pAT153; Whitton et al., 1983), BamH1 g (cloned in
pAT153, from A. J. Davison) and BamHI p and BamHl c (cloned in
pUCI8 for this study). HSV-2 inserts were recovered by BamHI
digestion and agarose gel electrophoresis, fragmented by sonication
and cloned into the SmaI site of M13mp8.
Sequences of the M13 clones were generated by chain termination
methods (Bankier & Barrell, 1989). 7-deaza-2'-deoxyguanosine5'-triphosphate was generally substituted for dGTP (Mizusawa et al., 1986).
Sequences were compiled using the program set of Staden (1982).
Regions presenting problem sequences were resolved using electrophoresis in a 6% polyacrylamidegel containing 9 M-urea, with a water
jacket maintained at approximately 80 to 85 °C. Some use was also
made of Taq DNA polymerase (Promega) and Bst DNA polymerase
(Bio-Rad) for elongation reactions at 70 °C. For more than 95 % of each
sequence, data were obtained for both strands.
Sequences across BamHI sites representing boundaries between
adjacent plasmid-cloned fragments were obtained using the polymerase chain reaction (PCR) with Taq DNA polymerase (Saiki et al.,
1988) and Vent DNA polymerase (New England Biolabs). Using
genomic DNA as template with suitable oligonucleotide primers,
DNA fragments across the BamHI sites were amplified, cloned into
M13mp9 and sequenced.
DNA sequence interpretation. The GCG program set was used for
analysis of sequences (Devereux et al., 1984). The program PTrans
(Taylor, 1986)was used to prepare the listings shown in Fig. 2, 3 and 7.
Numbering of DNA sequences. The HSV-1 strain 17 sequence in Fig.
4, 6 and 7 is numbered acccording to McGeoch et al. (1988), with
changes imposed by corrections at two loci. First, the coding region of
the ICP34.5 gene in RL was corrected by deletion of residues 823 and
824 in TRL, and the corresponding residues in IRL (125547 and
125548) (A. Dolan, E. McKie, A. R. MacLean & D. J. McGeoeh,
unpublished data). Second, results in this paper correct the UL56
coding sequence by addition of two residues after residue 116343. The
net effect of these changes is that the numbering for the region of
HSV-1 RL in Fig. 6 remains unchanged, and in Fig. 7 changes at
residue 125547.
We have determined the sequences of two sections of
HSV-2 strain H G 5 2 D N A : a region containing the left
end of UL and the TRL/UL junction (listed in Fig. 2), and
a region containing the right end of UL, the whole of IRL
and part of IRs (Fig. 3).
As shown in Fig. 1, the region of HSV-2 D N A
represented by B a m H I fragments f, p and g encompasses
the whole internal copy of RL (IRL), together with
adjacent parts of UL and IRs. Analysis by P C R of the
sequences across the B a m H I sites between f and p, and
between p and g, showed that fragments p and g were
contiguous in the genome but that a previously unsuspected 9 nucleotide sequence lay between the B a m H I
sites at the neighbouring termini of f and p; this latter
result was obtained with seven M 13 clones made in two
separate P C R amplification experiments. The whole
sequence determined comprised 16465 nucleotides of
composition 7 4 . 1 ~ G + C .
Comparison with the
sequence data of Davison & Wilkie (1981) for the HSV-2
'joint' region (the junction of IRL and IRs) showed that
2847 bp of B a m H I g lay in IRs; this part is not dealt with
in the present paper. The left 13618 bp of the sequence
represent the right extremity of UL and the whole of IRL,
and this part is listed in Fig. 3.
We also sequenced part of the B a m H I c sequence,
which runs from TRL into the left end of UL, and this is
listed in Fig. 2. Not all of the TRL part of B a m H I c was
determined. Comparison of the sequences in Fig. 2 and 3
located the TRL/UL and UL/IRL boundaries: the left end
of UL is at residue 172 in Fig. 2 and the right end is at
residue 4355 in Fig: 3. The version of IRL listed in Fig. 3,
including one copy of the a sequence (see Fig. 1),
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
Sequence of the HSV-2 long repeat
TRy_
II
IRL IRs
UI
I
I
II
a'
a
Us
I
3059
TRs
I
II
ll
[-
]
a
_-'1
UL1
UL3
UL2
UL4
II_
UL5
--..1~
UL53
v-'V"
UL54 | "''~ ~ t t 5 6
UL55
RL2
RL1
LAT
I
X
A
f
11 kbp
A
P
g
Fig. 1. Organization of the HSV genome around the long repeat elements. The top part of the figure shows an outline arrangement of
the major elements in the genomes of HSV-1 and HSV-2. UL and Us are bounded by pairs of inverted repeats (TRL and IRL; IRs and
TRs). There is a terminally redundant element (the a sequence); at least one copy of this is present in inverted orientation at the IRL/IRs
boundary (a'). The lower sections of the figure indicate locations of genes of HSV-2 adjacent to the ends of UL and in RL, as listed in Fig.
2 and 3. Coding regions of genes are indicated by solid arrows, with introns shown in genes RL1 and RL2 (see text). The location of the
LAT is shown by a dashed arrow, with introns or possible introns not marked (see text). The gene arrangements are also valid for HSV-1
(McGeoch et al., 1988) with the exception that the intron shown in RL1 is specific to HSV-2. The locations of HSV-2 B a m H I fragments
f, p and g are indicated at the bottom of the figure. The 1 kbp scale marker applies to both expanded sections.
contains 9263 bp of composition 75-4~ G + C . The
sequences determined at the left and right ends of UL
contained 5658 and 4355 bp, of composition 63.1~ and
66-2~ G + C respectively.
Organization of the HS V-2 genome adjacent to the ends
of UL and in R L
Fig. 4 illustrates the overall relationship between these
HSV-2 DNA sequences and their counterparts in HSV-1
in four two-dimensional comparative plots, and shows
that the two parts of HSV-2 UL sequenced are closely
similar to and collinear with the corresponding parts of
HSV-I UL, whereas the RL sequences are generally more
divergent. Alignments using the GCG program Bestfit
showed 80.2~ and 75.5~ identity between the HSV-1
and HSV-2 sequences for the left and right portions of
UL, respectively.
The HSV-2 UL sequences contain ORFs which
correspond closely to the gene organization proposed for
HSV-1 (Perry & McGeoch, 1988; McGeoch et al., 1988).
The left part of HSV-2 UL contains equivalents of HSV- 1
genes UL1 to UL4 and part of UL5; the right part of UL
contains equivalents of UL54 to UL56 and part of UL53
(see Fig. 1). Tables 1 and 2 summarize information on the
locations of genes in the HSV-2 sequences, and on
properties of the encoded proteins, respectively; data on
the two protein coding genes recognized in RL are also
included (see below).
R L contains a number of sets of short, tandemly
reiterated elements and other 'simple' sequences. Prominent tandem repeat sets are indicated in Fig. 2 and 3 as
families 1 to 7. The junctions between TRL and UL, and
between UL and IRL are defined by the occurrence of
family 1 at the extremity of RL. In the case of the IRL
clone sequenced (Fig. 3), this arrangement is a little
obscured by the fact that the repeat family occurs in a
minimal version of one complete plus one partial copy;
however, in the TRL clone there are three complete
copies and one partial copy (Fig. 2). At each U L / R L
junction, on the UL side of the junction there is a TATA
box-like sequence. The organization of the HSV-2
U L / R L junctions is thus very similar to that found
previously for HSV-1 (Perry & McGeoch, 1988), for
which it was suggested that the TATA box sequences
were perhaps functional in expressing genes UL1 and
UL56.
Aspects of sequenced genes adjacent to the left end of
HSV-2 UL
The amino acid sequences encoded by the nine HSV-2
UL genes wholly or partly sequenced are closely similar to
their HSV-1 counterparts (see Table 2), and we shall not
discuss them in detail. This and the following section
treat some points on the structure and function of the UL
genes which arose from evaluation of the HSV-2 data.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
3060
D. J. McGeoch and others
GCGCGGCGGCCGGGCGGGGGCGCGCGGCGGCCGGGCGGC~GGCGCGCGGCGGCCGGGCGGGGGCGCGCTTTCCCCGCGTCGCCCCTCGGGTTCCCAAGACCTATCACGTGTGCGCAGGGGA
• --/\ . . . . . . . . . . . . . . . . . . .
Family 2
I\ . . . . . . . . . . . . . . . . . . .
I\ . . . . . . . . . . . . . . . . . . .
120
/
.... I
E n d of T R L - - - > < - - - S t a r t of U L
GGGGAGGACGCGGGGGAGGGGAGGACGCGGGGGAGGGGAGGACGCGGGGGATATATAAAGCGGTAGAAAGCGCGGGAATGGGCATATTGGACCCGCGTGATTCGGTTGCTCGCGGTTGTC
\ ...............
Family
I\
...............
I\
...............
ULI
M
G
F
V
C
L
F
G
L
V
V
M
G
A
W
G
A
W
ACTGTCCGCTCGCTATGGGGTTCGTCTGTCTGTTTGGGCTTGTCGTTATGGGAGCCTGGGGGGCGTGGGGTGGG
G
G
20
360
V
P
S
TCACAGGCAACCGAATATGTTCTTCGTAGTGTTATTGCCAAAGAGGTGGGGGACATACTAAGAGTGCCTTGCATGCGGACCCCCGCGGACGATGTTTCTTGGCGCTACGAGGccCCGTCC
60
480
V
I
D
Y
A
R
I
D
G
I
F
L
R
Y
H
C
P
G
L
D
T
F
L
W
D
R
H
A
Q
R
A
Y
L
V
N
P
F
GTTATTGACTATGCCCGCATAGACGGAATATTTCTTCGCTATCACTGCCCGGGGTTGGACACGTTTTTGTGGGATAGGCACGCCCAGAGGGCGTATCTGGTTAACCCCTTTCTCTTTGCG
L
F
A
100
600
S
H
A
P
V
R
A
G
C
V
N
F
D
Y
S
R
T
R
R
C
V
G
R
R
D
L
R
P
A
N
T
T
S
T
W
E
P
AGCCACGCACCCGTCAGGGCCGGGTGTGTAAACTTTGACTACTCACGCACTCGCCGCTGCGTCGGGCGACGCGATTTACGGCCTGCCAACACCACGTCAACGTGGGAACCGCCTGTGTCG
P
V
S
180
840
S
D
D
E
A
S
S
Q
S
K
P
L
A
T
Q
P
P
v
L
A
L
S
N
A
P
P
R
R
V
S
P
T
R
G
R
R
R
TCGGACGATGAAGCGAGCTCGCAGTCGAAGCCCCTCGCCACCCAGCCGCCCGTCCTCGCCCTTTCGAACGCCCCCCCACGGCGGGTCTCCCCGACGCGAGGTCGGCGCCGGCATACTCGC
H
T
R
220
960
TTGTTTGGACGTTTTTTATGCGGGAACAAGGGGGCTTACCGOTTAC
S
UL2
240
I ........
1
Q
A
T
E
Y
V
L
R
S
V
I
A
K
E
G
D
I
L
R
V
P
C
M
R
T
P
A
D
D
V
S
W
R
Y
E
A
M
F
S
A
S
T
T
P
E
Q
P
L
G
L
S
G
D
A
T
P
P
L
P
T
S
V
P
L
D
W
A
A
F
R
R
CGTGATGTTTTCCGCATCTACGACCCCCGAACAGCCCCTGGGGCTGTCGGGCGATGCGACGCCGCCCCTGCCGACTTCCGTGCCCCTGGACTGGGCCGCGTTTCGGCGCGCGTTTCTGAT
D
D
A
W
R
P
L
L
E
P
E
L
A
N
P
L
T
A
R
L
L
A
E
Y
D
R
CGACGACGCCTGGCGGCCC
C TGTTGGAGCCGGAGCTCGCGAACCCCCTAACCGCGCGCCTCCTCGCGGAGTATGACCGTCGG
N
R
Y
L
E
T
R
D
I
M
P
I
D
W
S
V
R
C
Q
T
E
E
V
L
TGCCAGACCGAAGAGGTGCTGC
-
UL3
GAATCGC TAC CTCGAAACC CGGGACATTATGC
CGATCGAC TG G TC GGTATAAGATGCCGACATC
CG GGGTC TTGATTTAC GAGGGGGCAATTAATAAAGAC
A
I
39
1200
P
P
R
E
D
V
CGCCGCGGGAGGATGT
79
1320
T
T
P
I
E
S
I
A
G
T
A
P
D
A
H
V
G
P
L
D
G
E
P
D
R
D
A
I
S
P
L
T
L
255
M
V
K
S
R
5
TG T TGATGGTTAAATC TCG 1920
V
S
Y
R
S
v
M
S
G
V
G
E
E
R
V
P
S
A
F
T
I
L
A
8
W
G
W
T
F
A
P
Q
N
H
D
L
GGTCTCATACCGGTCCGTGATGTCGGGCGTGGGGGAAGAGAGGGTCCCCTCTGCGTTTACTATCCTTGCCTCGTGGGGCTGGACGTTTGCACCCCAGAACCATGATCTGGCGCGCTCGCC
N
F
S
S
V
A
R
S
P
45
2040
A
G
D
P
85
2160
S
L
Q
M
165
2400
GAATACGACGCCCKTAGAGTCGATTGCGGGGACCGCACCGGACGCGCACGTGGGGCCTCTCGACGGAGAGCCGGACCGGGATGCGATCTCCCCGCTTAcGTCGAGCGTGGCCGGCGACCC
K
F
S
I
A
C
T
K
T
S
S
F
S
G
T
A
A
R
Q
R
K
R
G
A
P
P
Q
R
T
C
V
P
R
S
N
K
CAAGTTCAGCATCGCGTGTACCAAGACCTCGTCGTTTTCGGGGACGGCCGCGCGCCAGCGCAAGCGCGGAGCACCGCCGCAACGCACATGCGTACCACGCAGCAACAAGAGCCTCCAGAT
UL4
CCCTTCCCCCGTTACTGATGTGTTGTACGTTTCAATAAATAACACGTAGCTTATTTTGTTGGATGATGGATTGATTGATTTTATTGACCGTTCGT~C~C~GGCGG~CG~c~G~c
2760
GCAGAGGGAATATGCAAGCGGGCGGGGTGGG
2880
178
GAG GAAAGAAG GTTTCAGGT TC CGGGG GTTGG GTCTGC GTCGTC CAGG GTGGGGC TGATC TGAATTTCC C GCAGAAC C TCGACCAGTAG
T
G
P
T
P
D
A
D
D
L
T
P
S
I
Q
I
E
R
L
V
E
V
L
L
GTCTGTTGTGTTTGCTGGGAACTCGCCCGCCGTTGGGGATACGGGGGCGGGGGGTGTGGTCGGGCGGACGTCCAGGGGTGCGTTATCGCACCCCCGCGCCGCCTcGGGGGCCGTCcCGTA
D
T
T
N
A
P
F
E
G
A
T
P
S
V
P
A
P
P
T
T
P
R
V
D
L
P
A
N
D
C
G
R
A
A
E
P
A
T
G
Y
3000
138
GATCGTTGCGGTGATGTAGATGGTGTCCGGGGTCCACACCACCGTCAGGATGCCGGCCGTCGCACTCCGGACGCTTTCGCCGTGCGATGAGCTGACCCAGGAGTCAAAGGGGTACGCGTA
I
T
A
T
I
Y
I
T
D
P
T
W
V
V
T
L
I
G
A
T
A
S
R
V
S
E
G
H
S
S
S
V
W
S
D
F
P
Y
A
Y
3120
98
CATATGGGCGTCCCACCAGCGCTCCAGCCTCTGGGTACTAGCGCGTCCTATAAAGCGGTATGCGCAAAATTCGGCACGACAGTCGATAATCACCAGCAGcCCGATGGGGGTGTGTTGTAT
M
H
A
D
W
W
R
E
L
R
Q
T
S
A
R
G
I
F
R
Y
A
C
F
E
A
R
C
D
I
I
V
L
L
G
I
P
T
H
Q
I
3240
58
CACCACGCCTCCGCGGGGCAGGCGGTCCTGGCGCGCTCGACCCCGCGTCAGAACCGCGcGCGTCCCTGACTCAAACACGTGCACCACCTGTGCCGCGTCCGGCAGCGCGCTCGTTAGCGA
v
V
G
G
R
P
L
R
D
Q
R
A
R
G
R
T
L
V
A
R
T
G
S
E
F
V
H
V
V
Q
A
A
D
P
L
A
S
T
L
S
3360
18
CGCCCTGGGGTGATGTAGGCTGTACGCGATGGTCGTCTGGGGGTTCCCCATG
A
R
P
H
H
L
S
Y
A
I
T
T
Q
P
N
G
UL5
ATAGACAATGACCACAT TCGGATCGCGTAGAGCAGATAGTATG
Y
V
I
V
V
N
P
D
R
L
A
S
L
I
TCTCGGGGGGGTGGGGGTGAATGTCACCCGGCCCGGGTGCGGTGGGAACGCGAGGGAATGGAGGGTTA
3480
1
M
TGC TCGCTAATGACG TCATCGCGT TCG TGGC GC TC C CGGAGCGGGTTTAGATTCATG
H
E
S
I
V
D
D
R
E
H
R
E
R
L
P
N
L
N
M
TGCAGGAAC TCGGATGAGGT
H
L
F
E
S
S
T
GGTGCGGGACATGGCTACGTACGCGCTGTTTAGGCGCAGGTTTCCGGGCGTGAAGCATATGGCGACCTTGTCCAGACTGAGCCCCTGGGAGCGCGTGATGGTCATCGcGAGTTTGGAGCT
T
R
S
M
A
V
Y
A
S
N
L
R
L
N
G
P
T
F
C
I
A
V
K
D
L
S
L
G
Q
S
R
T
I
T
M
A
L
GATGCCGTAG TCGGCGTTG ATGGCCATGGCCAGCTC CGTGGAG TCGATCGAC TCGACAAACTCACTGATGT
TGGTATTGACGACAGACATGAAGC
I
G
Y
D
A
N
I
A
M
A
L
E
T
S
D
I
S
E
V
F
E
S
I
N
T
N
V
V
S
M
F
GGGCAGGGGGGACTCCTCCAAGAACTCGGCCACGCCGGCCGTCGCGTGCCGCCGCCGCAGCTC
P
L
P
S
E
E
L
F
E
A
V
G
A
T
A
H
R
R
R
L
E
G
3720
K
S
C GTGC TGG TCCCGCAGGACGATG
H
Q
D
R
L
V
I
CTCCGCGAACGCGAACACCCGGGTGTACGTGTACCCCATCAGCGTGTAGTTGTCCGT
E
A
F
A
F
V
R
T
Y
T
Y
G
M
L
T
Y
N
D
S
TA
Y
3840
3960
T
CTGCAGGGCCACGGACATCAGCCCCCCGCGCGGCGAGCCGGTCAGCAGCTCGCAGCCCCGGAAAATGACATTGTCCACGTAGGTGCTGAAGGGGGCGCTCTCAAACACCTCCCCGAAGAG
Q
L
A
V
S
M
L
G
G
R
P
S
G
T
L
L
E
C
G
R
F
I
V
N
D
V
Y
T
S
F
P
A
S
E
F
V
E
G
F
L
CTCCCGTAGGATAAGGTATCGCCCCAGAAAGGCCCTCTTCAGGAGCCCAAACTGGGCGTGGAcGGCCGCGGTGGTCTCAGGCTCTTCGAGGGCGTAGTGGCAGTAGAACACGTCCAGCTG
E
R
L
I
L
Y
R
G
L
F
A
R
K
L
L
G
F
Q
A
H
V
A
A
T
T
E
P
E
E
L
A
Y
H
C
Y
F
V
D
L
Q
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
3600
4080
4200
Sequence of the HSV-2 long repeat
3061
cTGTTcG~cAGCcCGGcGAAGATAAcGTcAAGGTCGTCGTCGGGGAAGTcGTcCGGGcCCCCGTCCCGCGGGCCCAGGTGCTT~TTGAAcGCAcGCTcCCCCGGAGAGCGGTCGCT
Q E D L G A F I V D L D D D P F D D P G G D R P G L H K F N F A R E G P S R D S
4320
GGTGTcGGcGGccC~GTTGcCGATGCGCcGGcGGCGTCcCGGCGTAGCGACAGGAGTTCTGCCGTCAGC~CCCTAGGCGGCCGTAGGCCAGGGTCCTCTGGGTCGCGTCCAGGCCGGG
T D A A R T A S A G A A D R R L S L L E A T L E G L R G Y A L T R Q T A D L G P
4440
GCGc~GAG~AGTTGTAAAAGTGAATcAGCCC~CGAACATGAGCCGCGACAGGAACCGGTAGGCGAACTCCACCGAGGTCTCCCCCTGGGTCTTCACGAAGCTGTCGTCGCGCAGCAC
R Q L F N Y F H I L G G F M L R S L F R Y A F E V S T E G Q T K V F S D D R L V
4560
AGCcTcGAAGGTCcG~ACG~cCGTCGAAccC~AcACCATCTTTcGGAGGCGCGCGGTCACCGCGACCTGGCTGTTGAGGACGTACGTGATGTCGTTCCGGGCCAcGAcTAGCTGTTG
A E F T R F T G D F G F V M K R L R A T V A V Q S N L V Y T I D N R A V V L Q Q
4680
cTTGCTGTGcACCTcAcAGcGcACGTGcccCGcG~c~GTCCTGACTC~GGAGTAGTTGGTGATGCGACTGGCGTTGGCCGTGATCCACTTTTCCA~GTCAGcG~GGTTGcTGcGT
K S H V E C R V H G A D Q D Q S Q S Y N T I R S A N A T I W K E M T L T P Q Q T
4800
GAGCcGTCGATACTcGTc~AcTcTTTGACCGACACAAACGTGAGCACGGGGAGGGT~ACACAAC~ACTCCCCCTCGCGAGTCACCTTTAGGTAGGCGTGGAGCTTGGcCATGTACGC
L R R Y E D F E K V S V F T L V P L T F V V F E G E R T V K L Y A H L K A M Y A
4920
GcTGACC~cTTG~GGAcGAGAACAGCCGCG~CACCCCGGAAGGT~GCCGGGTTGGTGATGTAACTTTCCGGGACGAC~AGCGGTCCAC~ACTGCATGTGC~CTCGGTGA~GG
S V E K H S S F L R T W G P L N A P N T I Y S E P V V F R D V F Q M H E E T I P
5040
AAG~CGTAcTCCAGCAcCTTCATGAGGT~CCGAACTCGTGCTCCACAC~TCGCTTGTTGTTAATGAAAATGGCCCAGCTGTGCGAGAGGC~CGTGTACTCGCGTAGGGTGCGGTTGCA
L G Y E L V K M L N G F E H E V C R K N N I F I A W S H S L R T Y E R L T R N C
5160
GATGAGGTAcGTGAGCACGTTTTCGCTCT~CGGACGGAGCATCGCAGTTTTTGGTGTTCGAAGG~GACTCCAGCGAGGCCGTCTGGGTCGGCGACCCCACGCACACCAGCACCGGCcG
I L Y T L V N E S Q R V S C R L K Q H E F T S E L S A T Q T P S G V C V L V P R
5280
CAGGCGGCCCGCGTAC~GGGGGTG~GTACAGGGCGTTAATCA~CACCAGCAATACACCACGGTCGTGAGTAGGTGCCGCCCCAGGAGCCCGGCCTCGTCGATGACGATAATGTTGCT
L R G A Y Q P T H Y L A N I M W W C Y V V T T L L H R G L L G A E D I V I I N S
5400
GCGGGTGAAAGCCGGCAGCGCCCCGTGTGTGACCGAGGCCAGGCGCGTGAGGGCACCCTGGCCCAGCCCCAAAGTCTGCTCTAGGG~GTGAGGGCGTGGAACTCGTTTCGCGCGTCTTC
R T F A P L A G H T V S A L R T L A G Q G L G L T Q E L A = T L A H
F E N R A D E
5520
GCcCCcGTGCGccGCCAGGGCcCGCTTGGTGATGTCGAGGATCACCTCCCAGTAGTACGTCAGGTcTCGCCGCTGCAGGTCT~CAGCGAGGCGGGGCTGC~GCCAGGGTGTACGGGTG
G G H A A L A R K T I D L I V E W Y Y T L D R R Q L D E L S A P S S A L T Y P H
5640
CTGcccCAGcTGGGCC~GACGTGATTcCCGcG~ACcCGAAcTcGTGAAAGA~GTGTTGATGGGTCGACTCAG~ACGCcCCCGAGAGCTTAACGTACATGTTCTGCGCCGCGATTCG
Q G L Q A Q V H N G R F G F E H F I T N I P R S L F A G S L K V Y M N Q A A I R
5760
BamHI
CG~GC~CCGTGACCACGCAG~CAGGACCTCGTTGAGGGTCTGCACGCACGTACTCTTTCCGGA~C
T A G T V V C D L V E N L T Q V C T S K G S G , - -
5829
Fig. 2. HSV-2 DNA sequence of the left end of UL. The rightward 5' to 3' strand is shown ~r the pa~ial sequence dete~ined ~r the
BamHI c ff~ment. Proposed enc~ed amino acid sequences are indicated in the single-lettercode; rightward and le~wa~ translated
amino acid sequences are shown above and below the DNA sequence, respectively.Gene names are at the le~ of the first line showing
the amino acid sequence, rega~less of orientation. Prominent sets of sho~, tandemly reiterated sequences are marked as \ ..... /.
Putative TATA boxes and polyadenylation-ass~iated sequences are underlined or overlined.
In HSV-2 there are two A T G sequences (residues 198
to 200 and 258 to 260 in Fig. 2) upstream of the U L 1 0 R F
which are not conserved in HSV-1. These are out-offrame with UL1 and were not assigned a coding
function. HSV-1 gene UL1 m a y be a locus at which
mutations can give rise to a syncytial plaque phenotype
(Little & Schaffer, 1981; Perry & McGeoch, 1988). A
possible N-terminal signal sequence for translation on
m e m b r a n e - b o u n d ribosomes seen in HSV-1 UL1 is
conserved in HSV-2. In gene UL3 a possible near
N-terminal signal sequence is present in both HSV-1 and
HSV-2.
G e n e U L 2 encodes the D N A repair enzyme uracilD N A glycosylase in HSV-1 (Mullaney et al., 1989) and
HSV-2 (Worrad & Caradonna, 1988). A c D N A clone
corresponding to UL2 of HSV-2 strain 333 was
sequenced by Worrad & Caradonna (1988), and it was
found that most of the sequence was clearly similar to
that for the HSV-1 UL2 region, but that the HSV-2
sequence was dissimilar in its first 390 nucleotides,
across the proposed start of the O R F encoding uracilD N A glycosylase. However, we have found that the
similarity between the HSV-1 sequence and our HSV-2
sequence does continue across the 5' regions of the genes,
and that residues 1 to 390 in the sequence of Worrad &
C a r a d o n n a originate from another HSV-2 gene, US2
(McGeoch et al., 1987); their c D N A clone was thus
probably formed artefactually in vitro.
Previously, we put forward two possible candidates as
A T G translation initiation codons for HSV-1 UL2, the
upstream of which lay within the UL1 coding region
(Perry & McGeoch, 1988). In our HSV-2 sequence, the
first possible A T G initiation codon of HSV-1 is
conserved whereas the second possible A T G of HSV-1 is
not conserved. The HSV-2 sequence possesses an
additional A T G , lying between the locations of the two
HSV-1 A T G sequences, in the correct reading frame and
not overlapping the U L 1 0 R F . We consider that this
locus must be regarded as the primary candidate for the
translation initiation site of HSV-2 UL2, as shown in
Fig. 2. The HSV-1 sequence aligned with this site is
A C G (residues 10123 to 10125 as numbered by
M c G e o c h et al., 1988). A C G is known as an alternative
initiator of translation (see for instance Curran &
Kolakofsky, 1988; G u p t a & Patwardhan, 1988) so it is
possible that this codon is a translation initiator for
HSV-1 UL2.
Aspects of sequenced genes adjacent to the right end of
HSV-2 UL
The HSV-2 sequence determined at the right end of UL
includes the whole of the 545 bp non-coding region
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
3062
D. J. McGeoch and others
old3
12o
BamHI
480
GGGCCTACGCACCCTCGCACGTCGCATGCAAATTAAAATCGTGCACAGAGCCGATCCGGCCTCGGGTCTGCTTGCCCCTCCCC
CAC CCCAC CCTACCGCGTGC
CGGCCCAGCACAGGCAGGCTCGTCCGACTTCCGCATA
600
TTCCGCACCCCCGCCTACGCGTGTACGCGAAGGCGGACCCAGACCTGCCGTATGCTAATTAAATACATAAAACCCACCCTCGGTGTCCGATTGGTTTCTG
GGGACGGCGGGGGCGGGGGCGGTGACGCC
CGACGGGGAGGGACAAGGAGGAGTTTCGGAAAGC
720
CGGCCCCGGTCGTGCGGGTATAAGGGCAGCCACCGGCCCACTGGGCGCTGTGTGCTG
840
UL54
CCGTGTGCCGACCCCGGTTGCGCGTCGGTGCCGCTCCTCGATTCGGACCCGGC
M
1
960
P
R
E
P
H
G
CCCCCGGGAACCGCACGGG
121
1320
CACTCTCTTCCGACACGCGCCCCCTCGGAGGACACCCGCCATCCCAGCCCCGGCGACCTACAACATG
E
A
S
T
P
R
P
A
A
R
R
G
A
D
D
P
P
P
A
T
T
G
V
W
S
R
L
G
T
R
GAAGCCTCGACGCCTCGCCCGGCAGCGCGGCGGGGAGCCGACGATCCGCCACCCGCGACCACCGGCGTGTGGTCGCGCCTCGGGACCAGGCGGTCGGCTTC
R
S
A
S
A
D
T
I
D
P
A
V
R
A
V
L
R
S
I
S
E
R
A
A
V
E
R
I
S
E
S
F
G
R
S
A
L
V
M
Q
G CCGACACCATCGACCCCGCCGTTCGGGCGGTTCTGCGATCCATATCCGAGCGCGCGGCGGTCGAGCGCATCAGCGAAAGCTTTGGACGCAGTGCCCTGGTCATGCAAGACCCCTTTGGC
D
P
F
G
281
1800
L
K
A
R
G
L
C
G
L
D
D
L
C
S
R
R
R
L
S
D
I
K
D
I
A
S
F
V
L
V
I
L
A
R
L
A
C TGAAGGCCCGAGGCCTGTGCGGGCTGGACGACCTGTGCTCGC•GCGACGCCTGTCGGACATTAAGGATATTGcCTCCTTTGTGTTGGTCATCCTGGCcCGCCTCGCCAACCGcGTCGAG
N
R
V
E
441
2280
R
G
V
S
E
I
D
Y
T
T
V
G
V
G
A
G
E
T
M
H
F
Y
I
P
G
A
C
M
A
G
L
I
E
I
L
D
CGCGGCGTGTCGGAGATCGACTACACGACCGTGGGGGTTGGGGCCGGCGAGACGATGCACTTTTACATCCCGGGGGCCTGCATGGCGGGTCTCATTGAAATACTGGACACGcACCGCCAG
T
H
R
Q
481
2400
E
C
S
S
R
V
C
E
L
T
A
S
H
T
I
A
P
L
Y
V
H
G
K
Y
F
Y
C
N
S
L
F
GAGTGTTCCAGTCGCGTGTGCGAGCTGACGGCCAGTCACACTATCGCCCCCTTATATGTGCACGGCAAATACTTCTACTGCAACTCCCTATTTTAGGCAAGAATAAAcATATTGACGTCA
512
2520
ACCCAAGTGGTTCCGTGTGATGTTCTTGGCGCGCGCGGCGGGTGGGGCGGAGACTCCGGGGCGATGCCGGCGTGCGCGTGGGAGGAGGGCGATGACCCACCGGATAAATGTGGGGCCCCG
2640
2 6o
T
S
G
P
I
H
C
F
F
F
A
V
Y
K
D
S
Q
H
S
L
P
L
V
T
E
L
R
N
F
A
D
L
V
N
H
GACCAGCGGCCCCATCCACTGTTTTTTCTTTGCGGTGTACAAGGACTCGCAGCACTCCCTTCCGCTGGTTACCGAGCTCCGCAACTTCGCGGACCTGGTCAACCACCCGCCCGTCTTGCG
E
L
E
D
K
R
G
G
R
CGAACTAGAGGATAAGCGTGGGGGGCGGC
R
97
3000
L
R
C
T
G
P
F
S
C
G
T
I
K
D
V
S
G
A
S
P
A
G
E
Y
T
I
N
G
I
V
Y
T G C G G T G C A C G G G C C C A T T C A G C T G C G G A A C C A T C A A G G A C G TC TC C G G T G C A T C C C C C G C G G G G G A A T A C A C G
ATAAACGGTATCGTGTA
137
3120
H
C
H
C
R
Y
P
F
S
K
T
C
W
L
G
A
S
A
A
L
Q
H
L
R
S
I
S
S
S
G
T
A
C CACTGTCACTG TCGG TATC CGTTCTCCAAAAC C TGC TGGCTCGGGGCATC CGCGGCC CTACAAC ACC TTCGCTCTATAAGCTCAAGCGGCACGGCCGC
H
K
I
K
I
K
I
K
V
CCACA/L%ATCAAAATCAAAATCAAGGTATAACC
TAGGAACCCGGTAAATAC
CACGCGACGAACCAGCATG
TGTGTTAACGCAACTTTTATTCGTTGTATCGCGGGAGGGGGGAAGCTTAC
......
AC CGCGACCACCCCAAAAACCGCATGACGACACGTCC
V
A
V
V
G
F
V
A
H
R
C
T
G
C G C C A C AC C A C C C T G G G G C T T G G G G C G T G T C G G A G C
G
C
W
G
P
A
Q
P
T
D
S
TTGACAAGCGGGGGTCGCCACGTGCGCGAGCTTTGCACGCGGGGTTGGTCGGC
N
V
L
P
P
R
W
T
R
S
S
Q
V
R
P
Q
G
G
A
R
V
L
TGTGAGTTTGTGGGTTA
CGCCAAAGGAAGGC CAAGATGATAACGACGAC
C
R
W
L
F
A
L
I
I V
V
V
TCGACGCACAGCGGGC CGCGCGTTGGGCC
S
A
C
R
A
A
R
Q
A
177
3240
186
3360
3480
225
CG G T A C A G C T C T C G C G A A
R
Y
L
E
R
S
3600
185
CGGCC C CACGGACC CGCC CGG TGGC TCGGTCG GACATGCGGCCATGACCATGGCGTAGGTGGGGGGG
P
G
V
S
G
G
P
P
E
T
P
C
A
A
M
V
M
A
Y
T
P
P
3720
145
S
C C G A C G G G A G G T C G C C T C C C A C GC C A G G G T G G G C C C C A A T C A T A G T T T C C G G T A G A A A C A G G G G G G T C
S
P
L
D
G
G
~
G
P
H
A
G
I
M
T
E
P
L
F
L
P
T
GGGCCAAAGC TCCGGCGC CGCGC CCGTCGTTCGGCGCGGCGCC
P
G
F
S
R
R
R
A
R
R
E
A
R
R
GGTGAAACAAGCCCAACCGGCGACGTCC
P
S
V
L
G
V
P
S
T
D
P
R
A
A
E
Q
R
R
TCGCGCG GCAGAACAGCGACG
CACCC C CTTCC CTC CGAGTC CGTATGCAAC C TCATTAATAAAGAGTGAGAACCAACCAAAACAGACGCGG
UL56
CGATCCGAGGTCGC CTCTGCGTAAGTAGGGAGGC
R
D
S
T
A
E
A
Y
T
P
L
A
P
TCCACAAACAACCCCCC
T
E
V
F
L
G
G
3840
105
TGGCGCGCCG AGCGGC C CGC CAGGCGGCGCG GCGCGAG CGGCCACGC TCACACACC TCGC CGTCACCGGAAGAAGC C
A
R
R
A
A
R
W
A
A
R
R
S
R
G
R
E
C
V
E
G
D
G
S
S
A
3960
65
CTGC AGAGTACGGTGGAGGCGAGTC
CG TGGGGGTGTCGATATCAATAACGACAAACTG
A
S
Y
P
P
P
S
D
T
P
T
D
I
D
I
V
V
F
GGCGGGGCGTC~ATCACGCTATCATC
TCCGTCATCCCTGCATGCGTGGGCATGCC
P
P
A
D
I
V
S
D
D
G
D
O
R
C
A
H
A
H
G
Q
CAGCC C C CAACGC CATGGTGGGGATTCGCGGC
A
G
L
A
M
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
GCC CGCGCTCGCGC CGGC CACACTC TCGTATGGG
G
A
S
A
G
A
V
S
E
Y
P
TC A G A A G C C T G C A T G T C G T G
TGG TCGGTCGTAG
4080
25
4200
1
Sequence ofthe HSV-2 ~ng repeat
3063
TCCAACGTGCCTCCCCCACCCACCACACAGCCGGTCCCCACGCCGACCACTAGACCGCAGACGTCGCCCAACCGAGGTCCCCGTGCACAGACCGCGCCTTTTATAGCCCCAGGGGTTGCT
4320
End o f UL ---><--- Start of IRL
AATTAACGCACGCATGCAGACGCAATTTATTTTGCTCCCCCGCGTCCTCCCCTCCCCTGCGCACACGTGATAGGTCTTGGGAACCCGAGGGGCGACGCGGGGAAAGCGCGCCCCCGCcCG
4440
.......
\ ...............
/\
....
\ ..............
Family 1
Family 2
GCCGCcGCGcGCcCCCGCCCGGCCGCCGCGCGCCCCCGCCCGGCCGCCGCGCGCCCCCGCCCGGCCGCCGCGCGCCCCcGCCCGGCCGCCGCGCGCCCCCGCCCGGCCGCCGCGCGCCCC
.....
I\
...................
/\
...................
I\
...................
I\
...................
I\
...................
I\
4560
........
CGCCCGGCCGCCCGCGT~GCGCCGG~GCCCC~CCGGCGCTTcCGGG~TCTTTCCTTCCTTCCCCGCCGCGA~CCCGACC~CGCC~CACCG~CCCGCCCGGCAGGGGGGCCCCGG~GCC
4680
........... /
RL2
GCGCAGAACACAcAGACGAACACACGGTGGCGATCTTTTCTTTACTTCGGCGGACCAGCGAGCCCCGGCCCCGGCCCGCGCCCCGCCGCCACACCCACGGCACCCCCCCCCGCCGCCCAC
4800
CCCGGGGTCCACACAGGAGCGCGCGGGCGGCAGAAACGCGGGCGCGGCGGCGGTCGGGGTGGGAGTGGTGGTGGGGGACACGAAAACACACCCACGACACTCTCCCCCCACCCCGACCGC
4920
CGCCGCGCCCCACCGGCGGGATCGCGGCGAGACGCAGCCGGGCCCCCCCCCACCACCCGCCCACCCACCTACCCCGCGCCCGCAGCCTCCGGCAGCACGCCGACCACCGCCGCCACCCCC
5040
CAAACAGCCAAGGCGCGGTGGGGGGCGTGGTGGTGAACGATGGGGGGAACACGGGGGGGAGGGGTCCGGGGCGAGGCGGGCGGGCGAAGGAAGGGGGGGTGGTGGCGGC•GCGGTGGAAA
5160
GCGGAAAAACGGAGGATGGAAGGGcAGAAGATGGGGAGTCCCGATCCTCCTCCTGCATCCCCTCGCCTTCCATTCTCCGGCCCTCCGCGAGTCCCGACGCCCCCCCCCCGcCGCCCGACG
5280
AAGGAGACCCAAGCAcCGCAGCCGGAGAGGCCGAGCGGGGAGTGGGCGGCCGGGCGGGAGGATGGCGGAGAGAGAGAGAGAGAGAGAGAGAGGGGGGGGGGGGGAGAGGGAAAGCAAcGG
5400
GAAAGAGAGGCGCGcGGAAAAGCAGCAAGAGGGGGGAcGGGGCGAGCCGGGCAGAGTGCGGAGCCCCCGGAGCCCGCGGcCGCAGCCGAGCAGCGCCGcGGGCTCCGGGGCCGGGCCGGG
5520
CCGGCAACGCCCCGCGCCGGCCGCGGCGGAGAGAACCCCTGTGTCATTGTTTACGTGGCCGCGGGCCAGCAGACGGGCCGCGGGCCAGCAGACGGGCCGCGGCGCCAGCGGCCCACGCCT
5640
CCCGCCGCATTAGGCCCCCGCGGGCATCCGGCGGCCGGCCCCACGCCCTTCCATTAAACACTCCCACGTTGGGGGGGGGCGCGCCAGCTGAGTGCTCTGCGGT~CGGGCGC~GTGCCCG
5760
GAGATCcATTAAGCCGCcGGAGAGCCCGAGCCCcGCCCGCGTGTTGCTGTGGGCATTTCTGCTGCGTCATCCCTGTCTTTATAAAACCGGGGGCGCGGCAGcAACGAACGcAGGGGCcCG
5880
CCGCCGATCGAGAGGGACTCCGGAGAAGGAAGGCTGCTCCGCGCACCGGCGCGCCCTTCTCCTCTCCCCTCCCTACCTCCCCCTCTCTTCCCCCTTTTTTCCCCCGCCTCCCGTCTTCTT
6000
CCGCGCcTCCGAGGGTCCGCCTCTTGCCTCGGGGACCCCCGGGCGGGCCGGGGCTTGGCCGCCGAGGTGCGCCCCGGCCGGAGGGGCCCCCGCACCTCGGCGGCCGCCCCCTCCGG•GCC
6120
GCGCGTTCGCGAAAGGCGCGAAAGGGGCCCCCGGAGGCTTTTTTCGATTCCCGGCCGGGGGTCCCGGGTAGCCGCCCGGCGCCGGGCGGAAGGCGTCCCCCGCCCGGCGGTCCGGCCCGG
6240
GCCCCCGGCGGAGCGCGGGGGCCCCGGGGCCCCGGGCCGCGCCGGCGGCGTTTCCGCGTTCCGTTTCTTCTCCCTCCCGGGCCGCCCCGCTCCCGGGCCCGACCCTCGCCCCTTCCCTTC
6360
TCCTCGTCTTCCCCCGTCCCGCCGCGCCCCTTCCCTCTTCCTTCTCTCTCTCTGTCTCGCTCTCCTCACATTTCCCCCCCCCCCCCCCGCCGCCGCCGCCCTTTGCCCGCGTCCCACCGA
6480
Proposed LAT splice donor site /
GAcGCCGCGCCGcGTGAGCCGTCCGCCGGGGGACCCAGGCTCCGGGGGGGGGGGGCGCCTGCGTGTGTCTCGTGTGAGAGAGCGCGCCCCTCGAACGCCGCGCGTTCTCGCAGGTAGGTT
6600
TAGGGTCGTACAGGTGAGCTTCTGCTGAGGCGGCGGGGAGAGGGGGGGGGGGCGGGCGGAAGAGAGAAGAGAGCAGGGGTTGGGGGAGAACTGTTCTTCCTCcCCcTTTCAAGAAACACG
6720
AGGCGGGGGTcCCAGAAAGGGCAGGCAGGTCAGCCGCACCGCCCGCGAGCCAACCCGTATCCTTTTTTTCTAGGTGTTTTTGTTTTTGTTTCTGTTTTTGTTTGTTTTGTTATTATTTTC
6840
GCGGATCCGGCGTGTTCGGATCCACCCCCCCTTTCTCCTTCCTCTTCCCTTCCACCCACCCCCGTTTCCCCCCCCCCCGTCGTCGTTCCCGGGGGGGCAGGCGCGGGTCGGGCCCGTACG
BamHI
BamHI
6960
CCCAcCGCCCCCACGcGCCGGTCACCCCCCCCCAACAACCCCAAAGGCGCGTGCCCGGCCACAGCCGTGGGTGTGGCGCCCGTCCCCTTCCTCTACCGcGTGGGCGCGGGCGGGGGGGTG
7080
G~GTAGTGGTGGCGGAAGGAAACGGGCCGGGGGCCGGGGCCGCTAGGGAAAGGTAGGCACGCGCGCGGTGTGTCGACTTGCATGCCCCGCAAAACGCGTCGTGTCGTGTTGTGTCGTGG
7200
TGGGCCGTGTTGTGGTGGGCCGTGTGGTGTGGTGTGGTGTTGCGAACGCGCGAGCCCCCTCGCCCCGATGGGAGTCTCCCCGCAGCCAGGGTAAGGAGGGGCGGGCGTGGCGGGCAGGTG
7320
TGCGGGCGGGGTGGGGTGAGTGCGGTTGCATGCCTCGGGTCTCCTCTTCCTGCTCCTCCTCCTTTCTCCCAGCCAGGGTGAGGAGGGGCGGGCGTGGCGGGCAGGTGTGCGGGCGGGGTG
7440
GGCGCCGGGGCGGGG~TGGGCACGGGCGTAAGTGCGGGTGCATGCCTCGGGTCTTCTCTTCTCCCTCCTCCTTCCTCCCACCCGTCCCCGGGGGCAGAGGGCGTGCATGCGT~TGATTC
7560
AACCGCCCTCGCCCCCGCCCCACTTTCCCCCCTCTCTATCAAAGTTCCCTGGCCCCTGGCTTCGCGCCGGTGGTGCGGCTGACCCCCCCCCTCCTCCCTCCCCGAGCCAGGCGCCCTCCC
7680
ACTCCTGCCCACCACCCCCAGGGTCTGGCCGGCCAGACGTGCGTGCTCTGCACGATCGGGCCCCCCTCCCTGTCAACACGGACACACTCTTTTTTTACCCGCCAGCCAGCCCGCCCACCC
7800
ACCAAGACAGGGAGCCAGAACGAGGCCGGGCCCCGGCTCTGTTCTATGATAAAGACCAACAGGCCTCGGGGGTGGGGGCGGCTTCTCGTGCCCGCCCCCCCTCCTCCTCCTCCCTTCCCC
7920
•CCATCCCCGGCCCCCCTGCGCGGGGGAGCTGCATCAAAGGCCAACAACAAAGTGTGTCAAAAGCATCACAAAACTTTATTGTAAAATTTTTATAAATATAAAGTTTTTTTTTTCCTCAA
8040
GTTTTCAACAAGGCCAGAAAGTCCATAACAAAATGCTGGTGTGTGTTGCTGTTCGGGGCCGTGTCCGTCCCCCCCCCCCACTCCCACC•CCACTTCCTGTCTCCTCCCCGTCTTTCCCCC
8160
CCCCCACCTC~CCCTGCCCCCGAGGCGCCTCGGCCGGTGGTCCGGTGGGGGGCGGCTTCCTTCGGGCAGCAAGCCGAGTGTTAGCTCCCCCTACTCCCCGTG~CCGCGGGGGCGTCGCC
E G H G A P A D G
8280
817
GGCCGGCGCGGGCGCGCCCTGCTCCCGAGACCACGGGTGGCGCGACCGGAGGCCGTGGAAGTCCAGCGCGCCCACCAGGGTGCCCTGGTCAAAGAGCATGT~CCCACCGGGGTCA~CA
A P A P A G Q E R S W P H R S R L G H F D L A G V L T G Q D F L M N G V P T M W
8400
777
GAGGCTGTTCCACTCCGACGCGGGGGGCGTCGGGTAGTCGGGGGGCCTCACGCAGTTGCGCGCGTGCTCGGGGAGCAGGGTGCGGCGGCTCCACGCGGGGGCCGCGGCCCGCAGCAGGTC
L S N W E S A P P T P Y D P P R V C N R A H E P L L T R R S W A P A A A R L L D
8520
737
CGCCACGT~CCCGTCTGGTCCACGAGGACCACGTAGGCCCCTATGTGGCCCGTCTCCATGTCCAGGACGGGCAGGCAGTCCCCCGTGACCGTCTTGT~ACGTAAGGCGCCAGGGCCAC
A V N G T Q D V L V V Y A G I H G T E M D L V P L C D G T V T K N V Y P A L A V
8640
697
GACGCTcGAGACCCCCGCGATGGGCAGGTAGCGCGTGAGGCCGGGCGCCGGGTCGCGGGCCCCGGGCTCGGGGCCGCCCTCCGCGTGGCGCGTCTTCCTGGCACACT~CTCGGCCCCCG
V S S V G A I P L Y R T L G P A P D R A G P E P G G E A H R T K R A C K R P G R
8760
657
Family 3
Proposed LAT splice acceptor site \
......... \/ ............. \/ .....
CGGCGCAGCAGcGCGGGGGCCGAGGGAGGTTTCTCGTCTCTCCCCAGCGCCGGACGCGGACGCGACGCTCCCACCAGCCCCGCCCGCAGAGGAAGAGGCGGAGGAGGAGGAGGCGGAGGA
P A A A R P G L S T E R R E G A G S A S A V S G G A G G A S S S A S S S S A S S
8880
617
........
\/
.............
\/
.............
\/
.............
\/
.............
\/
.............
\
GGAGGAGGCGGAGGAGGAGGAGGCGGAGGAGGAGGAGGcGGAGGAGGAGGAGGCGGAGGAGGAGGAGGCGGAGGAGGAGGAGGcGGCGGCGACCGCGGCC~GGACGACGGAGACGCCGA
S
S
A
S
S
S
S
A
S
S
S
S
A
S
S
S
S
A
S
S
S
S
A
S
S
S
S
A
A
A
V
A
A
Q
S
S
P
S
A
S
9000
577
CGGGGGCGCGGCGCcCGCGGACGCCGGGGcGAGCG~CCGTGGCc~CGGTCGCCCGAG~CGAG~CGGGGCCCGGCGCGGCGCCGCCCTCTTGGCCCCcACCC~CTGGGGGGCGAGGGG
P P A A G A S A P A L P G H G R D G S D S D P A R R P A A R K A G V G Q P A L P
9120
537
CGAGCGCGGGGCGGCGGAGGAAGAGGCGGAGGACGAGGCCGCGGGGCCCGAGTCCGACCCGCGCCTCTTCCGGGGGCGGGCCGCCGCCCCCTCCGCGGCGTGGGGGGCGGCACCGGGGGT
S R P A A S S S A S S S A A P G S D S G R R R R P R A A A G E A A H P A A G P T
9240
497
GTTGGTGCCGCGGGGGACCCCGGG~CTCCC~CGC~CCGGCCC~CCGACCCGCGCGCGTCGGTCGCGCCTGCCCGGCCCAGACTCTGTGCTTGGGTGTCGGTCTGAGCCTGGGTCAT
N T G R F V G P G G E A G P G G S G R A D T A G A R G L S Q A Q T D T Q A Q T M
9360
457
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
D.J. McGeochand others
3064
GCGCGACCGGGGCGCGCGGTGCGCGTCCACCGGCACGGcGGGCGGCGCGGGCCCGGCCGCGTCCGCGCTCGCAGACACCACGGGGGCGGCGGCGGCGCGGGGcGGACTCCGGACGCGCGG
R S R P A R H A D V P V A P P A P G A A D A S A S V V P A A A A R P
P
S R V R P
9480
417
GGCGACGGCCGCGCGGGGGCGCGCGGCGCGCCCCGACGACTGTGGCAGACCTCCCCCCCCGGGGCCCGAGGACACCTGTGCGGAGGAGGAGGAGACAAAGGAGAGCGGCCCGGGGCCCGC
A V A A R P R A A R G S
S Q P L G G G G P G S
S V Q A S
S S S V F S L P G P
9600
377
GGGGCGGCGCGGAGACGGCGGGGGAGAGTCGCTGATGACTATGGGGGGCTCCTGGGCCGCGCGGGGCTGTCTCGCGGGGGGCGTCC
F R R F S P P P S D S
I V I
P P E Q A A R P Q R A P P T
R
G
E
G
A
TGCCCTCCGCCGCCGCGGCGTCTTCGCCCACCCG
A A A A D E G V R
cCGCGCCTGCGCGCGCCCCCCGCCGGcCGCAGGGGGAAGAGAGGCCACTCTCGGCACGACGGCCGCGACGGCAGGGCCGCCCCCAGACCCAGATCCCACCCCCGCCCGCAACGGGGCGCC
R A Q A R G G G A A P P L
S A V R P V V A A V A P G G G
SG
S G V G A R L
P
GCCGCTGCTGCTGCTCCGCGGGGCGCCAGGGGGCGCCGGTCGGGTCGCGGCGGGCTGGGAGGTTCCGCGGGTCGCCCcCGCACCGCCGCcCCCGCGcCGGGGCGCTCTTCGGGGGGCGGG
GS
S
S
S R P A G P P A P
R T A A P
Q S T G R T A G A G G G G R R P A R R P A
P
A
9720
337
9840
297
G
9960
257
S t a r t of e x o n 3
\/
E n d of i n t r o n 2
CGG••CGTAGTCCACTGCAGAGGGAGACAGA•AC•GGAGCCCCCGGTTAGT•CCC•ACCCCCGCCCGACCCCC•CCC•ACCCCCGCCCGACCCCCGCCCGACCCCCGCCCGACCCCCGCC
P
V
Y
D
V
\ . . . . . . . . . ]\ . . . . . . . . . /\ . . . . . . . . . ]\ . . . . . . . . . /\ . . . . . . . . . /\ . . . . . . . . . /\..
Family 4
S t a r t of i n t r o n 2
\/
E n d of e x o n 2
CGACCCCCGCCCGACCC••GCCCGACCCcCGCCC•A•CCCCGCCC•CCCCCC•CCCGACCCCCGCCCGCCCTCACCGTC•GcCAGGTCATCGTCCTCGTC•TCC•TGCC•GGCCACGGGG
. . . . . . . /\ . . . . . . . . . /\ . . . . . . . . . /\ . . . . . . . . . /\ . . . . . . . . /\ . . . . . . . . . /\ . . . . . . .
D
A
L
D
D
D
E
D
D
T
G
GGG TGGGCGACAGGGCGCGGAC
C G T G TG T C C C C C C A G C G A C A G G G A G C G C G G G G C C G T C C G C G G G T T G C C C G TC C A G A T A A A G T C C A C G GC C G T G C C G G C C C G C A C G G C
T F S L A R V T H G G L
SL
S R P
A T R P N G T W I
F D V A T G A R V A A E
CCACGCGGGTCCGGGGGTCGTTCACTATCGGGATGGTGCTGAACGACCCGCTGGCGGTCACGCCC
V R T R
P
D N V
I
P
I
T
S
FS
S
S
A
T V G V
TCCAGGTCTTCATGC
W T K M C
ACGG GATGCAGAAGGGGTGCAGGC
P
I
C
F P H L
AGGGAAAACTCTGGC
C P
F
S
QC
P
W
P
G
BamHI
CTTCCTCCTCACCCACGGGCCCACCCCCACAGGATCCCTGCGCGTCGGCGGGCGTGGGGCTGCCCTGGCGCTCGGCCGGGGGCCGGGCCGGGGGCGTGGCCGCGTCCATCAGGCCCGCCT
E E E G V P G G G C S G Q A D A P T P
S G Q R
E A P P R A P P T A A D M L G
CGAACATCTCCG TGTCCGTGCTGCCCGCCTCGGAGGTGGAGTCGCGGTGAAGGTCGTCGTCAGAGATTCCCAC
F M E T D T S G A E S
T S D R H L D D D S
I
G
V
E
P
I
TGC TTTTTGTTCGGAAGGGGGGGAGAAAGGGGTC
CG T A A C C A A A G G T G G T C T G C G
TCCTCC CTGCC TCCCTCGCCCCCCCAGAGGGTCGGGGGGCGGCGCACGGCCCACGGGGGTCCCCCGACCGCT
10560
i17
A
E
10680
77
TGCATGTCGT
D N
10800
37
CTC GGTCTCCTCCTCCGAGTCGCTGCTGGCGAGCCAC
T E E E S D S
S A L W Q
ACCGCCCGCGAC
CACCCC CAAC CCGCAGC CGGGTG GTC CGGG GAAAAGGGGGG
M
TAAGCGGGCCGGGGGTCGGCCC
10920
26
CCCCCCTGTCCCCCGCTCTCG
GGC
CGTCAAGCGTCCCCGCCCCCGAGCCC
S t a r t of i n t r o n 1
\/
E n d of e x o n 1
GC C T G A G A C C C G G G G G T C G C C C TC TC AC C G T G C C G G G G G T C T G C C G C G G C G G C C G C
T
G
P
T
Q
R
P
P
R
TCGGGGCCGGG
E
P
G
P
11040
11160
11280
13
GTCCGCCCGGGAGCTCGTGCCGGGCCGGGGTTCCATGAGCCGGGGTAGGGTAGACTCGAGACGGCGGCCCGCGGTCTCTCTCTTGCCGGGTTTTAGTCTCTGTCTCTCCGGGTCTCCTCC
D
A
R
S
S
T
G
P
R
P
E
M
11400
1
TCCCGCCGGGCCGCCGCTCCGTCGCTCGCAGTGCCGGGGTGCGAATGCGGCCCGACCGTCACACGGGGCTGCCTTATACCCGGCGCCTATCCACTCCCCCAAAGGGGCGGCATTTACGAT
11520
TCCCCCAATAGCCGCGCGCCC
CGGCGGGGGCGGAGGGAGGGAATCCCCCCCTCTCGGGGCGGCCCCGTCCCCGGGGACCAACCGGGTGTACTCCAAGAACCCCATTAGCATGCGCCGCCC
C CCGC CGACGCAGATGGGAGTCC
CC C C G G C G C C C C G C C G G C G C G G C C C T G A G T G G T G C
AGCCCACCCACCCGGCGGCGCGCGAG
TTAC CATAAGCGGGAATGGCGGC
RLI
TC C T T T G G A T T C C G A C C C C T C G T C T C
10440
157
CTCCCGCTTCCG
G A E A
S t a r t of e x o n 2
\/
E n d of i n t r o n 1
TGAGCATCCCCCAGGCGTGCGGGGCGGCGGGCTGCTTGACAAAGCAACGGGGGGGATTTAGAGGGCGCGGGGCGTGAGGCGGGACCCCCGCGCCGTGTCCCCCGTGTCCCTCCCTCACCC
L
M
G
W
A
H
P
A
A
P
Q
CGGCCCCCCGCCCGC
10200
237
CGC CTCGGCC T 10320
A E
197
ACTATCAGGTACGCCACCGGGGTGTTGCACAGGGGACACGTGTTGCGCAACGGAA
I
L Y A V F
T N C L
P
C
T N R L
A G C G C A G G G G C G G G GC G A T C T C G T C C G T G C A C A C G G C A C A C A C G T C G C C C C C C C
R L
P
P A
I
E
D
T C V A C V D
G G
P
10080
252
CCGCCCCCGGGGAAAAATTCATTAGCATAC
G C T C T G C G T G T TC T G C C A A G A A A G T A A T C A G C A T A A C C
C CGTTAAAAGC
TGCTAAT TAC C GCGAGCGGGAACGC
11640
TAG GAAGC CCAG GGGAC CAATAGGGGC CGATC
CGGAACCC CGAGGGAGTAATTACGCGGGGAGCGAGGGGCCGTC
CGGC CCATTAAAAGTTGCTAATTACCATGCGCGGGGATGGCGGC
CGAACGTTTTTAA
CGGGACCGCCTATTAAA
11760
11880
12000
AGTTTCTAATTACCATAcCGGGAAGCCGGCGcGGGGCGGTcGCcGGGGCGGAGTCCGGGCccGcGCGGCGGCGCGCGGTTGGCCGGCGcCGCCCCCTGGGGcGGGCGGAGcGG•GGGGcG
12120
GCGCCGGGCCcTCGcGGATATATACGCGGGGCTCCCATcGTcTCTTCGGAGAGCGGCCTCGCGCAGAcCTTCGGAGcTCCGGGGCTCCGCCGGCCGAGGcCGCCCTCGCCGGTTCAAcCC
12240
TAGACCGCCcGAcGGCCcGGGCCCGCGGCGGCGGAGGAcCCGcGCGCCGCCGCCGCCGCCTCCTCCTCCTCCGCGGGTcCGCCGTCTTCGTGGGCCCGGGCTCGGGcTCGGGccCGAGCT
V A R
R
G
P
G A A A
S
S
G
R A A A A A E
E
E
E
A P
G
G
D
E
H
A R
A R A R
CGGGCCTCGGGCTCCAGGCACGGTCCGATGACCGCCTcGGCCGCcGcCACGCGGCGCCGGAAccGGTCGCGGTCGGCcCGCTCGCGCGCCCAGGACCCCCGTCGGGCCAGGCGCGCGGCC
R
A E P E L C
P G
I V A E A A A V R
R
R
F R D R
D A R
E R A W S G R R A L R
GTC ~CC CAGGCCACCAGATGGCGC
T
E
W
A
V
L
H
R
CGTCAGGGGGTCGGAGGG
\ .................
GGGCCGCCGC
A R A
A
12360
223
12480
183
A
S t a r t of e x o n 2
\/
E n d of i n t r o n
AC C T G C A C G C G C G G C G A G A A G C A C A C C T G C G G G C G G G G A G A C A C G G G G G
TCGGAGGGGCGTCAGGGGG
TCGGAGGGGCG TCAGGGGGTCGGAGGGG
12600
V
Q
V
R
P
S
F
C
V
. . . . . . . . . . . . . /\ . . . . . . . . . . . . . . . . .
/\ . . . . . . . . . . . . . . . . .
/166
Family 5
S t a r t of i n t r o n
\/
E n d of e x o n 1
GCGTCAGGGG GTCGGAGGGGCGTCAGGGGGTCG
G A G G G G C G T C A G G G G G T C G G A G G G G A G G C G T A C C T T C C C G C G C G G C G C G TC C G C G G G C G G G G A C G C G G G
/\ . . . . . . . . . . . . . . . . . /\ . . . . . . . . . . . . . . . . .
/\ . . . . . . . . . . . . . . . . . /
K
G
R
P
A
D
A
P
P
S
A
P
CGGCGCAGGC TCAGGCGCGC
CAGG TAC TCCGTCGTGGTGCGCAGC
CGTAGCGCCAG
GTGGGGCGGAAGGGG
GCGCTGCGGCCCGCGCTC
CTTGCGCGGCGGCGGCGGGGG
P R R R R L S L R A L Y E T T T R L R L A L H P F L P R Q P G R E K R P P P P P
GCAGGCGGCGGCAGGCGCGGCGTGCGGGGC
C
A
A
A
P
A
A
H
P
A
C T C C G G C G C C T T C C C C C C G C C C T C G C T C G G G G G G C T G T T C G C C C A C TC T G C G T C G T C G T T G C C G G C G T A G T C
CGCGTCGTCGCTG
E
P
A
K
G
G
G
E
S
P
P
S
N
A
W
E
A
D
D
N
G
A
Y
D
A
D
D
S
CGC CTGGGGCAC CAGCAGC C AGCGC CGCAGGAGCGAG
A Q P V L L W R R L L S
G A C G C G GC C G G C G C G C T C T C G A C
S A A P A S E V A
C GCGGT TCC CGAGTCGTACGCAGGGAC
T G S D Y A P V M Q S
TCGTC
D
D
CATTTGGGAGTCTGCGGTTGGGAGCGCGCCGGG
D A T P L A G P
12720
154
12840
114
12960
74
13080
34
~ii~ ;
GCGCGGCACGGCTGGAGCGCCGGGGCGCGGCACGGCTGGAGCGCCGGGGCGCGGCCGGCGCCGGGGACCCCGGCGGCGGGGACCcCGGCGGCGGGACATGGCGGGCGGCTGGGCTCGGCG
R
P
V
A
P
A
G
P
R
P
V
A
P
A
G
P
R
P
R
R
R
P
G
R
R
R
P
G
R
R
R
S
M
.....
I\ ......................
I\ ......................
I ........
Family
TAGGC CCGGAGCCGGAGCGCG
TCGGGGCGGGAGAGTTCACTCGGCACGCATGC
I\ .............
/\ .............
13200
1
I
7
A C G T G T A A C C G C C A G T C C G T G C T T G C C T A G C G A A C T C A C C C G TC C C G G C T G G C G T G C G C A G C C C G G G
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
13320
Sequence of the HSV-2 long repeat
3065
<--S t a r t of "a" s e q u e n c e
CCGTGTTGCGGGCCCTCTTAAGGGGCGGCGGCAGGACGGGGACTCCCGCCCCGCCTCTTTTCC~CCGGGGAGTCAACCCCCGGGGGGGGTGTTTTTTGGGGGGGGGCGCGAAGGCGGGCG
13440
GCGGCGGCGGGCGGGCGGCAGGGCAGCCCCGCGCGcCCCCTTCCCCGTCCCTCCCCCGGAGCCGGCCGCTCCCCCGCGGGCGCCGCCCCTCCCCCCGCGCGCCGCGGGGCTGCCTTCCCG
13560
End of
"a"
sequence
--->
~GG~c~ccccGcGcGGcTTTTTTccc~cGcc~cGc~cA~GAcG~G~AcTAGcAG~TGTGccGcA~AccAccAcAcAcTcccAAGcTc~ccG~c~AA~A~AGT
13680
Fig. 3. HSV-2 D N A sequence of the right end of U L and the whole of IR u. The sequence is shown for the rightward 5' to 3' strand, from
the BamHI site at the left end of B a m H I f t o the right end of the internal copy of the a sequence. Conventions are as for Fig. 2. Putative
splice donor and acceptor sequences for LAT, RL2 and RL1 are labelled.
Table 1. Location of coding regions and transcripts of
Table 2. Properties of HSV-2 encoded proteins
HS V-2 genes
No. of
codons
Protein
M~*
Identity
to HSV-1
protein ( ~ ) t
ULI
224
2519I
65-2
UL2
UL3
255
233
28478
25647
85.1
74.7
UL4
UL5:~
201
(783)
21805
75.9
(90.4)
UL53~:
(136)
UL54
512
54955
79-3
UL55
UL56
186
235
20440
24713
86.4
62.8
RL2
825
81981
61-5
RL1
261
27906
62.7
Translationt
Gene*
Start
Stop
UL1
301
972
UL2
1085
1 849
1907
3411
(5829)
(2)
958
2711
4151
9974
10834
11316
12530
13179
2605
2809
3481
406
2493
3268
3447
8254
10156
11242
12243
12685
UL3
UL4
UL5II
Ut53tl
UL54
UL55
UL56
RL2
RL1
Transcript:~
start
~200?
-
C
C
C
C
C
C
Exon
Exon
Exon
Exon
Exon
3
2
1
2
1
830
~2640?
~4300?
~11450
~13320
Transcript§
AATAAA
1893
or 2 674
1 893
or 2674
2674
2720
2720
458?
2502
3310
3418
8001
11881
* Sequence numbers for UL1 to UL5 refer to Fig. 2, and for UL53 to
UL56, RL2 and RL1 refer to Fig. 3.
1"The locations of proposed protein coding regions are given, from
the first residue in the translation initiation codon to the last residue
preceding the stop codon at the end of the exon. Leftward oriented
genes are marked C.
The 5' terminus of UL54 m R N A is from Whitton et al. (1983).
Other figures are tentative, based on features of the D N A sequence or
on HSV-I data.
§ The location of the polyadenylation-associated sequence
(AATAAA or ATTAAA) proposed for each transcript is indicated by
the position of the 5' residue in the sequence; the actual 3' terminus of
the transcript would then be 20 to 30 nucleotides downstream.
IIThe 5'-terminal regions of the UL5 and UL53 ORFs lie outside the
determined sequences.
between genes UL53 and UL54, which contains regulatory sites for transcription of the immediate early gene
UL54 (Whitton et al., 1983). Part of this section, together
with the non-coding sequence at the 3' end of gene UL54,
has been sequenced previously by Whitton et al. (1983)
for strain HG52. There are some small differences in the
two versions.
Comparison of the HSV-2 UL56 sequence with its
HSV-1 counterpart revealed an apparent frameshift
adjacent to the 3' end of the H S V - 1 0 R F . Additional
Gene
(88.9)
Protein : function
or properties
Probable syn-associated ;
hydrophobic N
terminus
U r a c i l - D N A glycosylase
Unknown; hydrophobic
N terminus
Unknown
Component of D N A
helicase-primase
complex
Syn-associated membrane
glycoprotein (gK)
Transcriptional regulator
(Vmw63, ICP27)
Unknown
Unknown; hydrophobic
C terminus
Transcriptional regulator
(Vmw118, ICP0)
Neurovirulence factor
(ICP34.5)
* Mr for unprocessed polypeptide chain.
t Presents percentage of identical aligned residues after alignment
with the corresponding HSV-1 sequence using the Gap program.
:~ Incomplete sequences for UL5 and UL53.
analyses of both sequences confirmed that the HSV-2
sequence was correct, but that in determining the HSV-1
sequence a compression had been incorrectly resolved, so
that the published version of the HSV-1 UL56 sequence
contains an error (Perry & McGeoch, 1988; McGeoch et
al., 1988). To correct this, ' C G ' should be inserted after
residue 116343 of the complete HSV-1 sequence as
numbered by McGeoch et al. (1988): that is, residues
116337 to 116351, A G C C G C G C C G C G C G T , become
AGCCGCGCGCCGCGCGT.
The effect of this change on the amino acid sequence
predicted from the HSV-1 UL56 gene is to remove two
amino acids from the C terminus and add 40, to yield a
total length of 234 amino acids. Both the revised HSV-1
sequence and the HSV-2 sequence possess an uncharged
and highly hydrophobic section of 18 amino acids
immediately adjacent to the C-terminal Arg residue (see
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
D. J. McGeoch and others
3066
(a)
}
,
i
r
I
I
I
I
J
i
I
J
/
(b)
5
i
i
i
i
I
i
L
t ~ L ~ J L J ~ L I ~
i
,J
-/
./
/
/
;7
]
<I,
>
~2-
"
/
/
]/
//
.
.
.
"
'"
.
.
.
•
¢
/
~3-
i
./
- i
/
v
'1-
).,•
.
/"
/
4-
,-7,3 +
k
.
I
.
10
'
J
ll
ULI
,
,
11
. . . .
12
HSV-I (kb)
UL3
[
I
13
14
113
UL5
UL53
UL4
I
,~
L
r
I
•
. . . .
I''
'
'
114
'
l
. . . .
[
'
'
'
'
I
115
116
HSV-I (kb)
UL54
'
117
UL55
UL56
UL2
(d)
(c)
14
•
.
~"i
;,,
,
.
I
,
,
,
~
I
,
,
,
,
I
,
,
,
,
I
,
,
,
,
I
,
,
.., . ,,,,<~ ~.~.!
'
~
I
,
....
.]
•
•
'
.
.
,
~ .
.
--
2
.-
2 12- "~' ~ '~
"
"
. }/;.!
¢-q
c~
7 ¸
;>
"..i -_ ..
..~ ...':2' ~
t'
.'.
2.
• .;.
*/
"'
'6,,
i.
•
,
10
•
:. , i . . .
~3. ".,-';;.
• ":.
:!-:)~
t
,,,it
....."*"" :
-'1
118
t+
.. :
¢"
?
119
..
•
¢-
• :
j;
"
*
/
.. " ; .
+,
/.
~'II"
"7
'
"t+
+..
'f
.,
i
•
" , • .*.'j.,
¢
--
I..t,
.'~'..;,
~ f¢
'
:~:
-
120
HSV-I (kb)
122
121
RL2
122
123
RL2 V
124
125
HSV-I (kb)
V
126
RLI
Fig. 4. Comparison of HSV-1 and HSV-2 sequences at the extremities of UL and in IR L. The four panels were produced using G C G
programs Compare and Dotplot. Parameters for Compare were: window, 35; stringency, 22. Locations of H S V - 1 0 R F s are shown
under each panel. (a) HSV-2 T R L / U L sequence (Fig. 2) compared with HSV-l residues 9001 to 14 386. (b) HSV-2 U L/I R L sequence (Fig.
3) residues 1 to 5000 compared with HSV-I residues 112 782 to 117 781. (c) HSV-2 UL/IRL residues 4501 to 9500 compared with HSV-1
residues I 17282 to 122 281 (i.e. overlaps b by 500 residues in both dimensions)• (d) HSV-2 UL/IR L residues 9001 to 14000 compared with
HSV-I residues 121 782 to 126781 (i.e. overlaps c by 500 residues).
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
Sequence of the HSV-2 long repeat
3067
Table 3. Proposed splice donor and acceptor sequences in HSV-2 RL*
Donor sequences
Consensus
L A T (6591-6599)
RL2 intron 2 (10158-10150 C)
RL2 intron 1 (11244 11236 C)
RLI (12687-12679 C)
b
C
A
A
A
a
A
C
C
A
g
G
G
G
G
GT
GT
GT
GT
GT
r a g t
A GGT
GA
GG
G A GA
A C GC
Acceptor sequences
Consensus
L A T (8793 8808)
RL2 intron 2 (9974~9959 C)
RL2 intron 1 (10834-10819 C)
RL1 (12530-12515 C)
y
C
T
C
T
y y
T C
GT
GT
CT
y y y y y
GT
CT
C
C T C C C
T GCT
T
C C C CGC
y y y
T C C
T CT
T GT
C C
y A G
C C AG
GC
AG
C AA
G
GC
A G
g
C
T
C
G
* Splice donor and acceptor consensus sequences are from Mount (1982). Partially conserved sites
are in lower case; 'y' and 'r' represent pyrimidine and purine nucleotides respectively; 'b' represents
'c' or 'a'. Splice positions are marked . N u m b e r s for HSV-2 sequences are as in Fig. 3, with leftward
5' to 3' sequences indicated by C.
Fig. 3); this structure could well constitute a transmembrane anchor domain, although there is no indication of a corresponding N-terminal hydrophobic signal
sequence in either instance. In other respects the HSV-2
data support the interpretation of the HSV-1 UL56 gene,
which was previously considered somewhat tentative.
Two possible ATG codons upstream of the assigned
HSV-1 UL56 ORF and out-of-frame with it are not
conserved in HSV-2. HSV-2 possesses an in-frame ATG
10 codons upstream of the start site shown in Fig. 3,
which could form an alternative translational start.
Organization of the HSV-2 immediate early gene in R L
In analysing the sequence of HSV-2 RL there are three
topics to be addressed: function of the region between
UL and the RL2 gene, part of which is transcribed into
LAT species; organization of the immediate early gene
(RL2) encoding IE118 (counterpart of HSV-1 IE110);
and function of the region between the RL2 gene and the
a sequence. Since the RL2 gene is the best characterized
entity in HSV-I RL, its HSV-2 counterpart is dealt with
first.
The HSV-1 immediate early gene, encoding the
transcriptional modulator IE110 or ICP0, is considered
to possess three exons, all containing coding sequences,
and to have an extensive upstream transcriptional
regulatory region (Perry et al., 1986; Mackem &
Roizman, 1982). DNA sequences in HSV-2 RL related to
the HSV-1 IE110 coding sequences were readily located
(see Fig. 4c and d). Appropriately positioned counterparts for transcriptional regulatory and polyadenylationassociated elements also exist. The HSV-2 regions
corresponding to the two HSV-1 introns were not
conserved in size or sequence, but were bounded by
appropriately located potential splice donor and splice
acceptor sequences, as shown in Table 3. It therefore is
reasonable to propose that these regions (residues 11 241
to 10835 and 10155 to 9975 in Fig. 3) are introns in the
HSV-2 RL2 gene; the lack of sequence conservation of
the proposed introns with their HSV-1 counterparts is in
accord with properties of introns in general. The
sequence data are thus thoroughly consistent with the
HSV-2 RL2 gene having an organization closely similar
to that of the HSV-1 gene, as shown in Fig. 3.
Nonetheless, authentication by direct transcript
mapping should be undertaken. Two sets of tandemly
reiterated sequences occur within the HSV-2 gene.
Family 4 almost fills the downstream intron, whereas
family 3 occurs in exon 3 and is thought to encode protein
(see below).
The predicted amino acid sequences for HSV-1 IE110
and HSV-2 IE118 are aligned in Fig. 5, and present some
interesting features. In the sequence encoded by the
second exon, the region of conserved Cys and His
residues originally noted for the homologous proteins of
HSV-1 and varicella-zoster virus (Perry et al., 1986) is
present also in HSV-2 (residues 126 to 166). Characteristically similar sequences have also been noted recently
in certain non-herpesviral proteins (Freemont et al.,
1991).
The central part of the HSV-2 I E l l 8 sequence
(representing approximately the first 394 amino acids
encoded by exon 3; residues 252 to 645) is relatively
poorly conserved in comparison to the regions encoded
by exon 2 and the distal part of exon 3. The sequences in
this central region are notably hydrophilic. At residues
589 to 627 in the HSV-2 protein there is a set of seven
copies plus one partial copy of the sequence Ala-(Ser)4;
this is encoded by repeat family 3 (see Fig. 3). The
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
3068
D. J. McGeoch and others
<
Exon
1
>
<
Start
of
exon
2
MEPRPGASTRRPEG..RPQRE
...... PAPDVWVFPCDRDLPDSSDSEAETEVGGRGDADHHDDDSASEADSTDTELFETGLLGPQGVDG..GAVSGGSP
MEPRpGTSSRAD•GPERPPRQTPGTQPAA•HAWGMLNDMQWLASSDSEEETEVGISDDDLHR••DSTSEAG•TDTEMFEAGLMDAATPPARPPAE•QGSP
MEpRPG-S-R---G--RP-~
........
~P--W .... D ..... SSDSR-ETEVG---D--H---DS-SEA-STDTR-FE-GL
...........
<
Exon
1
><
Start
of
exon
90
98
A---GSP
2
PREED•GSCGGAP•RED..GGSDEGDVCAvCTDEIAPHLRCDTFPcMHRFCIPCMKTWMQLRNTCPLCNAKLVYLIVGVT•SGSFSTIPIvNDPQTRMEA
188
TPADAQGSCGGGPVGEEEAEAGGGGDVCAVCTDEIAPPLRCQSFPcLHPFCIPCMKTWIPLRNTCPLCNTPVAYLIVGVTASGSFSTIPIVNDPRTRVEA
...... GSCGG-P--E
........
GDVCAVCTDEIAP-LRC--FPC-H-FCIPCMKTW--LRNTCPLCN
.... YLIVGVT-SGSFSTIPIVNDP-TR-EA
198
End
of exon
2
><
Start
of exon
3
EEAVR•GTAVDFIWTGNQRFAPRYLTLGGHTVRALSPTHPEPTTDEDDDDLDDADYVPPAPRRTPRAPPRRGAAAPPVTGGASHAAPQPAAARTAPPSAP
EAAVRAGTAVDFIWTGNPRTAPRSLSLGGHTVRALSPTPPWPGTDDEDDDLADVDYVPPAPRR
.... APRRGGGG
.... AGATRGTSQPAATRPAPPGAP
E-AVRAGTAVDFIWTGN-R-APR-L-LGGHTVRALSPT-P-P-TD--DDDL-D-DYVPPAPRR
..... PRRG ........
GA ..... QPAA-R-APP-AP
End
of
exon
2
><
Start
of
exon
288
290
3
IGPHGSSNTNTTTNSSGGGGSRQSRAAAPRGASGPSGGVGVG•GV..VEAEAGRPRGRTGPLVNRPAPLANNRDPIVISDSPPASPHRPPAAPMPGSAPR
386
RSSSSGGAPLRAGvGSGSGGGPAvAAVV•RvASLPPAAGGGRAQARRVGEDAAAAEGRTPPA...RQPRAAQEPPIvISDSPPPSPRRPAGPGPLSFVSS
...............
SG-GG ..... A--PR-AS-P
.... G .......
V---A .... GRT-P ...... P-A .... PIVISDSPP-SP-RP
PGPPASAAASG
.........
PARPRAAVAPCVRAPP
.....
387
...........
PGPGPRAPAPGAEPAARPADARRVPQSHSSLAQAANQEQSLCRARATVARGSGGPGVEG
472
SSAQVSSGPGGGGLPQSSGRAARPRAAVAPRVRS•pRAAAAPwSASADAAGPAPPAVPVDAHRAPRSRMTQAQTDTQAQSLGRAGATDARGSGGPGAEG
..... S .... G ..........
ARPRAAVAP-VR-PP
....
P ..... A-A-G--P-A-P-DA-R-P-S
.... AQ---Q-QSL-RA-AT-ARGSGGPG-EG
GSGPSRGAAPSGAAPLPSAASVEQEAAVRPRKRRGSGQE
GPGVPRGTNTPGAAPHAA
G-G--RG
.... GAAP
.......
......
NPSPQSTRPPLAP..AGAKRAATHPPSDSGPGGRGQG
..... EGAAARPRKRRGSDSGPAASSSASSSAAPRSPLAPQGVGAKRAAPRRAPDSDSGDRGHGPLAPASAGAAPPSASPSS
..........
AA-RPRKRRGS
..........
S .... R-PLAP---GAKRAA
..... DS--G-RG-G
SAASASSSSASSSSAPTPAGAASSA..AGAASSSASASSGGAVGAL
.......
GPGTPLTS...
.......
G---P--E---
GGLTRYLPISGVSSVvALSPYVNKTITGDCLPILDMETGNIGAYVvLvDQTGNMATRLRAAVPGWSRRTLLPETAGNHVMPPEYPTAPASEWNSLWMTPv
PGLTRYLPIAGVSSvVALAPYVNKTVTGDCLPVLDMETGHIGAYVvLVDQTGNVADLLRAAAPAWSRRTLLPEHARNcVRPPDYPTPPASEWNSLWMTPv
554
582
..... GGRQEETSLGPRAASGPRGPRKCARKTRHAETSGAV
QAAVAAASSSSASSSSASSSSASSSSASSSSASSSSA•SSSASSSAGGAGGSVASASGAGERRETSLGPRAA•APRGPRKCARKTRHAEGGPEPGARDPA
.......
S--SASSSSASSSSA
.......
SSA .... A-SSSAS-S-GGA-G
........
G---ETSLGPRAA--PRGPRKCARKTRHAE
< ....
Reiterated
sequence
in HSV-2
--->
487
.... PA
636
681
.........
PA
736
781
-GLTRYL•I-GVSSwAL-PYVNKT-TGDCLP-LDMETG-IGAYVVLVDQTGN-A--LRAA-P-WSRRTLLPE-A-N-V-PP-Y•T-PASEWNSLWMTPV
GNMLFDQGTLVGALDFRSLRSRHPWSGEQGASTRDEGKQ
775
GNMLFDQGTLVGALDFHGLRSRHPWSREQGAPAPAGDAPAGHGE
GNMLFDQGTLVGALDF--LRSRHPWS-EQGA
. . . . . . . . . . . . .
825
Fig. 5. Alignmentof the HSV-1 I E 110 and HSV-2 IEI 18 amino acid sequences.The amino acid sequenceswere alignedand displayed
using GCG programs Bestfitand Pretty. The HSV-l IE110 sequence is shown above the HSV-2 IE118 sequence.The location of the
HSV-2 repeated amino acid sequence Ala-(Ser)4is indicated.
aligned HSV-1 sequence is also serine-rich but is not
perfectly reiterated. A basic region (residues 511 to 516)
is conserved, which is proposed to form a nuclear
localization signal (Everett, 1988). To summarize these
comparisons, the two proteins have regions of about 200
residues adjacent to both their N and C termini which
are well conserved in length and in identity of aligned
residues; these are separated by a poorly conserved,
hydrophilic region. These features could suggest a
structure in which the N- and C-terminal regions form
separate functional domains and are linked by an
extended hydrophilic structure of mostly less critical
functional importance. This view is broadly compatible
with analyses of HSV-1 I E l l 0 function (reviewed by
Everett et al., 1991).
Examination of the sequence of R L lying between the
immediate early gene and UL
Between the downstream ends of the HSV-1 and HSV-2
RL2 genes and the UL/RL boundaries there are regions of
3.7 kbp and 3.9 kbp, respectively. In both HSV-1 and
HSV-2, the 5' portions of LAT species are transcribed
from within this location and overlap the RL2 genes
(Wagner et al., 1988a; Mitchell et al., 1990a; see Fig. 1).
A major aim of our HSV-2 sequence analysis was to use
comparisons of D N A sequence in this region to gain
insight into LAT organization and function.
Fig. 4(c) shows in an overview manner that the HSV-1
and HSV-2 sequences in this region are significantly
divergent. Some similar sequences do exist towards the
UL end of the region, and effort was put into constructing
an alignment of these, which proved a non-trivial task.
The D N A s are G + C - r i c h and also contain many
repetitive and simple elements; these factors give rise to
the high background seen for the region in Fig. 4, and
similarly they obscure attempts to discern genuinely
homologous parts of the two sequences. Another factor is
that convincingly homologous regions may be separated
by other regions the sizes of which differ widely between
the two DNAs. An alignment was produced for part of
the region, as shown in Fig. 6. This was generated by first
using plots such as those in Fig. 4 and making local
alignments with the G C G Bestfit program to identify
HSV-1 and HSV-2 sequences judged to be unambiguous,
genuine homologues; these loci are indicated in Fig. 6.
Alignments of flanking regions were then made with the
Bestfit program. No convincing alignment was made
between the UL/RL boundary, at residue 4356, and the
start of the sequences in Fig. 6, at residue 4984. Between
the main alignment in Fig. 6 and the downstream ends of
the RL2 genes only one additional feature was judged
worth registering: as shown separately at the foot of Fig.
6, this encompasses the mapped 5' terminus of HSV-1
LAT (Wagner et al., 1988a).
Thus, although similarities exist upstream of and at
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
Sequence of the HSV-2 long repeat
CCCACCCACCCCACGCCCCCACTGAGCCCGGTCGATCGACGAGCACCCCCGCCCACGCCCCCGCCCC
..... TGCCCCGGCGACCCCCGGCCCGCACGAT
CCCACCTACCCCGCGCCCGCA..GCCTCCGGCAGCACGCCGACCACCGCCGCCACCCCCCAAACAGCCAAGGCGCGGTGGGGGGCGTGGTGGTGAACGAT
CCCACC-ACCCC-CGCCC-CA--G---CCGG--G--CG-CGA-CACC-CCGCC--C-CCC---C--C
...... GC---GG-G--C---G
CCCGACAACA
....................
AAGGACGGGA.AGTGGAAGTCCTGATACCCATCCTACACCCCCCTGCCTTCCACCCTCCGGCCCCCCGCGAGTCC
5081
118332
5179
.... AACGG-GGATGG
< ...........
...........
ACCCGCCGGCCGGC
AAGGGCAGAAGATGGGGAGTCCCGATCCTCCTCCTGCATCCCCTCGCCTTCCATTCTCCGGCCCTCCGCGAGTCCCGACGCCCCCCCCCCGCCGCCCGAC
AAGG-C-G-A-A--GG-AGTCC-GAT-C-C-TCCT-CA-CCCC--GCCTTCCA--CTCCGGCCC-CCGCGAGTCC
............
Conserved
......................
118252
.... G-ACGAT
ATAACAACCCCAACGGAAAGCGGCGGGGTGTTGGGGGAGGCGAGGAACAACCGAGGGGAACGGGGGATGG
GGGGGGAACACGGGGGGGAGGGGTCCGGGGCGAGGCGGGCGGGCGAAGGAAGGGGGGGTGGTGGCGGCGGCGGTGGAAAGCGGAAA..AACGGAGGATGG
---G--AACA
......................
A ...... C---CG-A
.... GG-GGGGTG-TGG-GG-GGCG--G-A-A-C-GA
Locus
1
.................................
118428
5279
CCCGCCG-CCG-C
>
< ............
TACCGAGACCGAA.CACGGCGGCCGCCGCAGCCG
..................................................................
GAAGGAGACCCAAGCACCGCAGCCGGAGAGGCCGAGCGGGGAGTGGGCGGCCGGGCGGGAGGATGGCGGAGAGAGAGAGAGAGAGAGAGAGAGGGGGGGG
-A--GAGACC-AA-CAC-GC-GCCG--G--GCCG
- Conserved
Locus 2 .............
118453
5379
..................................................................
>
..............................................................................................
CCGCAG
GGGGGAGAGGGAAAGCAACGGGAAAGAGAGGcGCGCGGAAAAGCAGCAAGAGGGGGGACGGGGCGAGCCGGGCAGAGTGCGGAGCCCCCGGAGCCCGCGG
..............................................................................................
CCGC-G
CCGCCGCCGACACCGCAGAGCCGGCGCGCGCACTCACAAGC..GGCAGAGGCAGAAAGGCCCAGA
CCGCAGCCGA
CCGC-GCCGA
.... GCAGCGC
.... GCAG-GC
Conserved
Locus
...........................
GTCATTGT
..... CGCGGGCTCCGGGGCCGGGCCGGGCCGGCAACGCCCCGCGCCGGCCGCGGCGGAGAGAACCCCTGTGTCATTGT
..... CGCG--CTC
.... GC--GGC-G-G-C-G-AA-GCCC-G
............................
3
..........
118459
5479
TTATGTGGCCGCGGGCCAGCAGACGGCCCGCG
............................
ACACCCCCCCCCCGCCCGTGTG
TTACGTGGCCGCGGGCCAGCAGACGGGCCGCGGGCCAGCAGACGGGCCGCGGCGCCAGCGGCCCACGCCTCCCGCCGCATTAGGCCCCCGCGGGCATCCG
TTA-GTGGCCGCGGGCCAGCAGACGG-CCGCG
.............................
C-C-C-CC-CCCGCC
....
--
3069
118530
5570
GTCATTGT
< .......
..........
GGTATCCG
T ...........
118592
5670
GG-ATCCG
>
< ....
GCCCCCCGCCCCGCGCCGGTCCATTAAGGGCGCGCGTGCCCGCGAGATATCAATCCGTTAAGTGCTCTGCAGACAGGGGCACCGCGCCCGGAAATCCATT
GCGGCCGGCCCCACGCCCTTCCATTAAACACTCCCACGTTGGGGGGGGGCGCGCCAGCTGAGTGCTCTGCGGTTGCGGGCGCCGTGCCCGGAGATCCATT
GC--CC-GCCCC-CGCC--TCCATTAA---C-C-C--G---G-G-G
........
C-G-T-AGTGCTCTGC-G
.... GGGC-CCG-GCCCGGA-ATCCATT
- Conserved
Locus 4 ...... >
< ................
Conserved
Locus 5
118692
5770
....
AGGCCGCAGACGAGGAAAATAAAATTACATCACCTACCCACGTGGTGCTGTGGCCTGTTTTTGCTGCGTCATCTCAGCCTTTATAAAAGCGGGGGCGCGG
118792
AAGCCGCCGGAGAGCCCGAGC
...........
A-GCCGC-G--GAG
.... A .............
.............
CCGT
CCCGCCCGCGTGTTGCTGTGGGC.ATTTCTGCTGCGTCATCCCTGTCTTTATAAAACCGGGGGCGCGG
CC--CCC-CGTG-TGCTGTGG-C--TTT-TGCTGCGTCATC-C-G-CTTTATAAAA-CGGGGGCGCGG
>
....................
Conserved
< .......................
Locus
6
5858
.....................
GCCGATCGCGGGTGGTGCGAAAGACTTTCCGGGCGCGTCCGGGTGCCGCGGCTCTCCGGGCCCCCCTGCAGCCGGG
118872
CAGCAACGAACGCAGGGGCCCGCCGCCGATCGAGAGGGACTCCGGAGAAGGAAGGCTGCTCCGCGCACCGGCGCGCC•TTCTCCTCTcCCCTCCCTACCT
C-G .....................
>
GCCGATCG-G-G-G---C---AGA
......
G ........
CG .........
5958
GC-CT-C
GCGGCCAAGGGGCGTCGGCGACATCCTCCCCcTAAGCGCCGGCCGGCCGCTGGTCTGTTTTTTCGTTTTCCCCGTTTCGGGGGTGGTGGGGGTTGCGGTT
CCCCCTCTCTTCCCCCTTTTTTCCCCCGCCTCCCGTCTTCTTCCGCGCCTCCGAGGGTCCGCCTCTTGCCTCGGGGACCCCCGGGCGGGCCGGGGCTTGG
-C--C .......
C--C ........
CC--CC-C
.... C--C--CCG--C
.... G---GT
.......
TT--C-C-G---C
.... C-CCC--C
.......
118972
6058
.... G-G--GG--G--GC
TCTGTTTCTTTAACCCGTCTGGGGTGTTTTTCGTTCCGTCGCCGGAATGTTTCGTTCGTCTGTCCCCTCACGGGGCGAAGGcCGCGTACGGCCCGGGACG
CCGCCGAG
.............................................................................
....
119072
6081
GTGCGCCCCGGCCGG
-C ...................................................................................
GT-CG-CCCGG---G
< ........
AGGGGCCCCCG•ACCGCGGCGGTCCGGGCCCCGTCCGGACCCGCTCGCCGGCACGCGACGCGAAAAAGGCCCCCCGGAGGCTTTTCCGGGTTCCCGGCCC
119171
AGGGGCCCCCGCACCTCGGCGGCc...GCCcCCTCCGGCGCCGCGCGTTCGCGAAAGGCGCGAAAGGGGCCCCC.GGAGGCTTTTTTCGATTCCCGGCCG
AGGGGCCCCCG-ACC-CGGCGG-C---GCCCC-TCCGG--CCGC-CG---GC
.... G-CGCGAAA--GGCCCCC-GGAGGCTTTT---G-TTCCCGGCCConserved
Locus 7 --->
< ..............
Conserved
Locus
6177
8
........
GGGGCCTGAGATGAACACTCGGGGTTACCGCCAACGGCCGGCCCCCGTGGCGGCCCGGCCCGGGGCCCCGGCGGACCCAAGGGGcCCC..GGCCCGGGGC
119269
GGGGTCCCGGGTAGCCGCCCGGCGCCGGGCGGAAGGCGTCCCCCGCCCGGCGGTCCGGCCCGGGCCCCCGGCGGAGCGCGGGGGCCCCGGGGCCCCGGGC
GGGG-C---G-T---C-C-CGG--G
........
AA-G ..... CCC-C--GGCGG-CCGGCCCGGG-CCCCGGCGGA-C---GGGGCCCC--GGCCC-GGGC
--->
< .................
Conserved Locus 9 ...............
6277
CCCACAACGGCCCGGCGCATGCGCTGTGGTTTTTTTTTCCTCGGTGTTCTGCCGGGCTCCATCGCCTTTCCTGTTCTCGCTTCTCCCC..CCCCCCTTCT
CGCGC..CGG..CGGCGTTTCCGCGTTCCGTTTCTTCTCCCTCCCGGGCCGCCCCGCTCCCGGGCCCGACC
C-C-C--CGG--CGGCG--T-CGC--T---TTT-TT-TCC
..... G--C-GCC--GCTCC---GCC---CC
>
I[
TCACCCCCAGTACCCTCCTCCCTCCCT
TCCCCCGTCCCGCCGCGCCCCTTCCCT
TC-CCC
......
CC---C-CC-TCCCT
119394
6396
[I
II
II
HSV-I
119447
6575
119367
.... CTCGCCCCTTCCCTTCTCCTCGTCT
.... CTCGC--CT-CCC--C-CC-C-TCT
LAT
5'end
0 .........
ACGCCGCG
.... TTTCCAGGTAGGTT.AG
ACGCCGCGCGTTCTCGCAGGTAGGTTTAG
ACGCCGCG
..... T--CAGGTAGGTT-AG
6369
>
119470
6603
Fig. 6. Alignment of HSV-1 and HSV-2 DNA sequences in the UL proximal part of IRL. The sequence alignment was produced as
described in the text. Regions judged to be genuinely homologous are indicated as loci 1 to 9. The proposed LAT TATA box sequence is
overlined. At the bottom of the figure is an alignment of the mapped Y-terminal position of HSV-1 LAT (now considered to represent a
splice donor position) with the apparently corresponding HSV-2 sequence. See note on HSV-1 DNA numbering in Methods. The
HSV-2 numbering is as in Fig. 3,
the mapped 5' end of HSV-1 LAT, no similarity is seen
for HSV-1 and HSV-2 D N A s within the portions o f the
L A T coding sequences which do not overlap the RL2
genes. The 5' end of HSV-2 L A T has not been mapped
precisely, but the HSV-2 sequence contains an appropriately located sequence which is closely similar to that
around the mapped 5' end o f HSV-1 LAT. However, it
has been suggested that transcription of HSV-1 L A T is
initiated some 700 nucleotides 5' of this site, perhaps
with the sequence T A T A A A A (residues 5840 to 5846)
acting as a T A T A box (Wechsler et al., 1988 a, b; D o b s o n
et al., 1989; Batchelor & O'Hare, 1990; Zwaagstra et al.,
1990). If the L A T promoter is in this locality, then the
conserved elements seen in HSV-2 in loci 4, 5 and 6 of
Fig. 6 presumably include important parts o f the
transcriptional regulatory signals.
We have examined the L A T region of H S V - 2 D N A
for signs of protein coding function. In summary, our
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
3070
D. J. McGeoch and others
results were negative. For the HSV-2 LAT region outside
the RL2 gene sequence, ORFs do exist, but are not
similar to ORFs in the HSV-1 LAT region. The HSV-2
LAT region outside the RL2 gene sequence, like that of
HSV-1, does not show signs of three nucleotide based
bias in nucleotide composition, the presence of which is
characteristic of most HSV-I coding sequences (Perry &
McGeoch, 1988). These outcomes, together with the
dissimilarity of the HSV-1 and HSV-2 sequences, are all
consistent with there being an absence of extensive
protein coding sequences in either LAT region outside of
the RL2 gene.
In the region where the LAT transcripts overlap the
RL2 coding sequence (about 530 bp), the HSV-1 and
HSV-2 DNA sequences are highly similar. We believe
that this represents primarily the coding requirements of
the RL2 genes. Putative encoded amino acid sequences
in all three reading frames in the LAT orientation also
show considerable similarities between HSV-1 and
HSV-2, but the distribution of stop codons differs. A
region of the HSV-1 DNA which has been discussed as a
possible LAT protein coding sequence is the so-called
ORF2 (Wagner et al., 1988a), which lies across the 3" end
of the RL2 coding sequence. In HSV-2 the counterpart of
ORF2 is disrupted by two stop codons within the region
overlapping the RL2 coding sequence. We interpret this
observation as indicating that ORF2 is not, at least in its
entirety, a real functional entity.
Recently, Doerig et al. (1991) have expressed the
3'-terminal 112 codons of HSV-1 ORF2 (which overlap
the RL2 coding sequence) in Escherichia coli as a trpE
fusion protein, raised antiserum to the product and
shown that a protein reacting with the antiserum was
detectable in neurons latently infected with HSV-1.
These observations are not readily reconcilable with our
sequence interpretations. One possibility is that, if part
of ORF2 is indeed translated in neurons, the m R N A
involved may be an as yet uncharacterized species.
Dobson et al. (1989) suggested that HSV-1 LAT might
in fact be an intron transcript of unusual stability excised
from a larger transcript. This proposal is compatible with
a number of properties of the LAT, namely its nuclear
location, lack of polyadenylation, apparent lack of
protein coding function and the fact that the mapped 5'
terminus is located in a sequence which conforms
excellently to the splice donor consensus (Fig. 6 and
Table 3). In a recent paper, Farrell et al. (1991) tested this
proposal by identifying a potential splice acceptor site in
the locality of the 3' terminus of the LAT, transferring a
copy of LAT D N A including putative splice sites into a
plasmid-borne E. coli lacZ gene, and expressing this in
tissue culture cells. They observed that the LAT
sequence was indeed spliced out of the lacZ transcript.
Features of the HSV-2 RL sequence are consistent with
LAT being an intron sequence. First, there is an
appropriately located candidate for a splice donor
sequence at the 5' end of LAT (around residues 6591 to
6599 in Fig. 3 and 6; see Table 3). Second, the HSV-2
sequence within exon 3 of the IEl18 gene contains an
appropriately located candidate for a splice acceptor
sequence (around residues 8790 to 8808 in Fig. 3; see
Table 3), which corresponds in its location to the HSV-1
acceptor sequence identified by Farrell et al. (1991).
Third, as outlined above, the HSV-2 LAT region that
does not overlap the RL2 gene does not exhibit
characteristics of protein coding DNA. Last, in the same
region the HSV-1 and HSV-2 sequences are markedly
divergent, as has been seen for the introns in the US1 and
US 12 genes of HSV-1 and HSV-2 (Whitton & Clements,
1984), and for the HSV-1 introns and their corresponding
HSV-2 sequences in the RL2 genes as described above.
If the LAT sequence is generated as an intron, then a
transcript containing the flanking exons must also exist.
We have evaluated the protein coding potential of such a
transcript in the sequences 5' and 3' to the LAT intron. In
the region between the proposed TATA box (locus 6 in
Fig. 6) and the splice donor site, the HSV-1 sequence
contains three potential ATG translation initiation
codons, but HSV-2 possesses none, and reading frames
are not conserved between the sequences. It is thus
unlikely that this region encodes protein.
The proposed LAT splice acceptor sites lie within the
exon 3 protein coding sequences of the RL2 genes. In
both cases the acceptors are just 5' (in the LAT
orientation) of the sequences which in the opposing, RL2
gene orientation encode the serine-rich parts and the
remainder of the poorly conserved central regions of the
IE110 and IE118 polypeptides. The first potential ATG
codons after the proposed LAT splice lie 653 and 552
nucleotides downstream in HSV-1 and HSV-2 respectively. We have not succeeded in identifying reading
frames which in our view are likely to be genuine. Little
is known about the structure of the proposed 'exon
transcripts' of the LAT transcription unit; they could, of
course, be subject to additional splicing. From the
available data it is possible that in HSV-1 the 3' terminus
of this transcription unit is adjacent to a polyadenylationassociated sequence, AATAAA, downstream of the
IE175 gene within Rs (Mitchell et al., 1990b; Zwaagstra
et al., 1990; see Fig. 1). Our HSV-2 sequence data for the
Rs part of B a m H I g indicate that this sequence is also
present in HSV-2 (not shown).
Organization of the H S V - 2 equivalent of the HSV-1 RL1
gene encoding ICP34.5
Chou & Roizman (1986) and Ackermann et al. (1986)
have produced evidence that in HSV-1 strain F a gene,
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
Sequence of the HSV-2 long repeat
TTTAAAGCGGTGGCGGCGGGCAGCCCGGGCCCCCCGCG...GCCGAGACTAGCGAGTTAGACAGGCAAGCAC.TACTCGC~TCTGCA~GCACATGCTTGCCTGTCA~CTCTACCACCCC
** ***
**
**
* ********
* * **
****
***
. *****
*
**********
***
**
* ****
125875
*****
****
******
**
****
cTT.AAGAGGGCCCGC~CACGGCCCGGGCTGCGCACGCCAGCCGGGACGGGTGAGTTCGCTAGGCAAGCACGGACTGGCGGTTACACGTGCATGCGTGCCGAGTGAACTCTCCCGCCCC
M
GGCACGCTC~TGTCT
* * *****
..........................
A
R
R
R
.
.
CCATGGCCCGCCGCCGC
*****
**********
**
R
H
......
R
G
P
R
R
P
R
P
13224
P
G
GACGCGCTCCGGCTCCGGGCCTAcGCcGAGCCCAGCCGCCCGCCATGTCCCGCCGCCGGGGTCCCCGCCGCCGGGGTCCCCGGCGCCGGcCGCGCCCCGGCGCTCCAGCCGTGCCGCGCC
M
S
R
R
R
G
P
R
R
R
G
P
R
R
R
P
R
P
.
.
.
.
.
.
.
.
T
.......................
G
A
V
P
T
A
Q
S
Q
V
T
S
/X .............
T
P
P
.
CGCCATCGCG~CCCCGCCGCCCCCGGCCGCCCGG~CC
****
** ** *****
****
* **
X .............
N
S
E
IX ........
P
A
V
R
S
P
A
G
G
P
P
P
S
C
S
L
L
L
R
Q
W
L
H
V
P
E
S
A
CCGCCGGTGGGCCCCCGCCTTCTTGTTCGCTGCTGCTGCGCCAG~GC~CACGT~CCGAGTCCGCGTCCGACGACGAC
**
***
*****
*
*****
*
** ***
S
D
**
*
D
S
D
A
G
A
*
P
A
A
A
P
A
P
E
A
R
P
T
A
A
A
P
R
P
R
P
P
P
P
G
V
G
P
G
G
G
A
D
P
GCGCCAGAGGCCCGGCCCACCGCCGcCGCCCCCCGGCCCCGGCCCCCACCGCCCGGCGTGGGCCCGGGGGGCGGGGCTGACCCCTCCCACCCCC
*****
*******
** * ****
* ***
**
**
** ** *****
**
**
*
***
E
R
V
P
13104
25
R
IX
P
P
**
**
P
.
.
.
.
.
.
P
50
D D D D W P D
GATGACGACGAC~GCCGGACA
** ****
** ***
**
S
P
P
P
E
p
GCCCCCCGCCCGAGCCG
********
****
K
18
125803
................
. . . . .
*******
GCCCCCCGAGCGAGGGCGGGGGG~GGCGCCGGAGGCCCCGCACGCCGCGCCTGCCGCCGCCTGCCCCCCGCCGCCGCCGCGCAAGGAGCGCGGG
S
P
P
S
E
G
G
G
K
A
P
E
A
P
H
A
A
P
A
A
A
C
P
P
P
P
P
R
A
A
TCCTCGCTCCTGCGGCGCTGGCTGCTGGTGCCCCAGGCGGACGACAGCGACGACGCGGACTACGCCGGCAACGACGACGCAGAGTGGGCG~CA
S
S
L
L
R
R
W
L
L
V
P
Q
A
D
D
S
D
D
A
D
Y
A
G
N
D
D
A
.
.
A
........
.
125706
..........................
.
.
.
.
.
.
.
.
.
.
.
*
P
. . . . . .
..................
*
.
................
*****
ACGGGCGCCGTCCCAACCGCACAGTCCCAGGTAACCTCCACGCCCAACTCGG~CCCGCGGTCAGGAGCGCGCCCGCGGCCGCCCCGCCGCCGCCCC
* *****
***********
* *****
*
* *
**
* ******
* ********
*********
CCGGCGCTCCAGCCGTGCCGCG~CCCGGCGCGCTCCC~CCGCAGACTCCCA~TGGTCCCTGCGTACGACTCGGGAACCGCGGTCGAGAGCGCGCCGGCCGCG
P
G
A
P
A
V
P
R
P
G
A
L
P
T
A
D
S
Q
M
V
P
A
Y
D
S
G
T
A
V
E
................
/\ ..............
.
X......................
A
3071
E
W
S
A
H
13000
60
84
125604
***
12906
91
N
P
121
125493
*
........................
. . . . . . . .
G
C
12810
123
X ......
P
S
R
P
F
R
L
P
P
R
L
A
L
R
L
R
V
T
A
E
H
L
A
R
L
R
L
R
R
A
G
G
E
G
A
P
E
P
P
A
161
CCTCGCGCcCCTTCCGCCTTCCGCCGCGCCTCGCCCTCCGCCTGCGCGTCACCGCGGAGCAcCTGGCGCGCcTGCGCCTGCGACG•GCGGGCGGGGAGGGGGCGCCGGAGCCCCCCGCGA
*
*******
********
* ***
** ** ** ******
***
*****
**************
*******
**
CGCAGCGCCCC
P Q R P
.IX
T
...... CTTCCGCCCCACCTGGCGCTACGGCTGCGCACCACGACGGAGTACCTGGCGCGCCTGAGCCTGCGCCGG
.
L
P
P
H
L
A
L
R
L
R
T
T
T
E
Y
L
A
R
L
S
IX . . . . . . .
.......
P
A
T
~
A
IX . . . . . . .
T
P
A
T
P
IX
.......
A
T
L
R
125373
..................................
.
.
.
.
.
.
.
.
.
.
.
R
12730
150
IX . . . . .
P
A
R
177
CCCCCGCGACCCCCGCGACCCCCGCGACCCCCGCGACCCCCGCGCGG
.........................................................................
*
* ********
****
*
*
****
* * **
*
*/ Start
of
proposed
HSV-2
intron
..CGGCGGCCCCCCGCG~CCCGCCCGCGGACGCGCCGCGCGGGAAGGTACGCC~CCCTCCGACCCCCTGACGCCCC~CGACCCCCTGACGCCCC~CGACCCCCTGACGCCCC~CG
.
R
R
P
P
A
S
P
P
A
D
A
P
R
G
K
X .................
/X .................
X***
**********
ACCCCCTGACGCCCC~CGACCCCCTGACGCCCC~CGACCCCCTGACGCCCC~CGACCCCCGTGTC~CCCGCCCGCAGGTGTGCTTCTCGCCGCGCGTGCAGGTGCGCCATCTGGTG
..........
/X .................
/X .................
/X .............
V
C
F
S
P
of
12612
/X .................
/\ .......
V
R
F
S
P
H
V
R
V
R
H
L
GTGCGCTTCTCGCCCCACGTCCGGGTGCGCCACCTGGTG
.................................................................................
End
125326
proposed
HSV-2
*
intron
R
V
***
*
*********
Q
V
R
H
L
165
V
190
125287
******
12492
178
V
V W A S A A R L A R R G S W A R E R A D R A R F R R R V A E A E A V I G P C L G
GTC~G~CTCGGCCGCCcGCCTGGCGCGCCGCGGCTCGTGGGCCCGCGAGCGGGCCGACCGGGCTCGGT~CGGCGCCGGGTGGCGGAGGCCGAGGCGGTCATCGG~CGTGCCTGGGG
* *****
*******
********
** ** ** ** *****
*****************
*
**************
*******
******************
GCCTG~AGACGGCCGCGCGCCTGGCCCGACGGGGGTCCTGGGCGCGCGAGCGGGCCGACCGCGACCGGT~CGGCGCCGCGTGGCGGCGGCCGAG~GGTCATCGGACCGTGCCTGGAG
A
W
E
T
A
A
R
L
A
R
R
G
S
W
A
R
E
R
A
D
R
D
R
F
R
R
R
V
A
A
A
E
A
V
P
E
A
R
A
R
A
L
A
R
G
A
G
P
A
N
S
V
I
G
P
C
230
125167
**********
L
*
12372
218
E
-
248
CCCGAGGCCCGTGCCCGGGcC~CCGCGGA~CG~CCGGCG~CTCGGTCTAACGTTAC&CCCGAGGCGG~C~GGTCTTCCGCGGAGC~CCGGGAGCTCCGCACC~GCCGCTC
***********
**
*******
*****
*
*
**
*
**
*
*
*
*
**
****
CCCGAG~CCGAGC~GG~CCGA~CCGAGCCCGGGCCCACGAAGACGGCGGACCCGCGGAGGAGGAGGAGGCGGCGGCGGCGGCGCGCGGGTCCTCC~CGCCGcGGGCCCGGGCCGT
P
E
A
R
A
R
A
R
A
R
A
R
A
H
E
D
G
G
P
A
E
E
E
E
A
A
A
A
A
R
G
S
S
~CGGAGAGACGA~GCAGGAGcCGCGCATATATACGCT~GAGCCA~CC~CCTCACAGGGCGGGCCGCCTCGGGGGCGGGAC~C~TCGGCGGCCGCCAGCGCGGCGGGGCCCG
*
*
*
*
* **
*
**
** *
****
*
*
***
*
*
*
A
***
A
A
*
*
G
P
G
*
*
**
*
*
125047
*
12252
258
R
124927
*
CGGGCGGTCTA~GTTGAACCGGCGAGGGCGGCCTCGGCCGGCGGA~CCCGGAGC~CGAAGGTCTGCGCGAGGCCGCTC~CGAAGAGACGATGGGAGCCCCGCGTATATATCCGCGA
R
A
V
-
*
*
*
12132
261
Fig. 7. Alignment of the HSV-1 and H S V - 2 R L 1 genes. The HSV-1 and H S V - 2 D N A sequences are shown, sta~ing with proposed R L 1
TATA boxes and including the whole of the RL1 coding regions. The upper line shows the HSV-1 D N A sequences. The sequences are
~ r the leeward 5' to 3' strands in IRL, but preserve the numbering used ~ r the rightward strands. See note on HSV-I numbering in
Meth~s. HSV-2 numbering ~llows that in Fig. 3. Proposed e n c ~ e d amino acid sequences are shown in the single-letter code. The
proposed HSV-2 intron is indicated.
encoding a protein termed ICP34.5, is located between
the RL2 gene and the a sequence in the same orientation
as the RL2 gene. An ORF of 358 codons is proposed to
encode ICP34.5 In the HSV-1 strain 17 sequence,
however, there were 20 frameshifts within the bounds of
this ORF, and no satisfactory alternative reading frame
could be proposed, although the DNA sequence did
display some characteristics of protein coding D N A
(Perry & McGeoch, 1988). This was part of the
background with which we undertook a comparative
analysis of the HSV-2 RL sequence. However, the
conflict has meanwhile been resolved. First, Chou &
Roizman (1990) revised the HSV-1 strain F sequence to
correct 19 of the 20 frameshifting differences from
HSV-1 strain 17, and proposed as the ICP34.5 coding
sequence an ORF of 263 codons (which was partially
coincident with the previous candidate ORF). Second,
re-examination of strain 17 showed that the sequence at
the remaining frameshifting difference from strain F was
correct in the plasmid clone employed, but that this clone
was atypical and the sequences of other clones were
compatible with the strain F version (A. Dolan,
E. McKie, A. R. MacLean & D. J. McGeoch,
unpublished data). Our HSV-1/HSV-2 comparisons for
this region have therefore focused on evaluating whether
HSV-2 possesses a counterpart of the HSV-1 ICP34.5
coding region.
As can be seen from Fig. 4(d), there are similarities
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
3072
D. J. M c G e o c h and others
between the two sequences in the locality of the ICP34.5
ORF. Aligning the sequences was complicated by the
occurrence of tandem reiterations, G + C-rich sequences
and a number of addition/deletion differences. The
sequences were aligned in the way described above for
the UL-proximal part of RL. From these exercises it was
possible to propose a coding sequence for an HSV-2
counterpart of ICP34.5. The coding regions of the
HSV-1 and HSV-2 RL1 genes are aligned in Fig. 7,
together with the proposed encoded amino acid
sequences. As is generally the case in comparisons of
substantially divergent DNA sequences, there exist
several near equivalent variations of the alignment: the
version shown in Fig. 7 is one which is compatible also
with optimizing the alignment of the proposed encoded
amino acid sequences, and which places padding
insertions at codon boundaries.
The HSV-2 RL1 coding sequence starts at an ATG
(residues 13179 to 13 177 in Fig. 3 and 7) aligned with the
HSV-1 initiator ATG proposed by Chou & Roizman
(1990). Both HSV-1 and HSV-2 possess an upstream
ATG; in HSV-2 this is at residues 13251 to 13249 (Fig. 3
and 7) and is blocked by a stop codon after four codons.
The HSV-2 coding sequences are interrupted by a set of
repeated sequences (family 5) consisting of six complete
copies and one partial copy of a 19 nucleotide element
which includes a stop codon, TGA, in the RL1
orientation; all three reading frames are thus blocked.
We consider that this repeat family must lie within an
intron in HSV-2 RLI : as indicated in Fig. 3 and 7, and
in Table 3, it is closely flanked by excellent candidates for
splice donor and acceptor sites, use of which would bring
the proposed HSV-2 coding sequence back into frame
with the distal portion of the HSV-1 R L 1 0 R F .
This interpretation of the HSV-2 RL1 gene is
supported by the distribution of G + C residues in the
first, second and third positions of the proposed codon
set (G + C content is higher for the third position), by the
pattern of substitutions observed between HSV-I and
HSV-2 (substitutions are most frequent in the third
position of codons), and by the similarity between the
encoded amino acid sequences (62.7 ~o identity of aligned
residues). The high incidence of addition/deletion
changes for the 5' portions of the ORFs would be unusual
for HSV-1 and HSV-2 genes in the UL or Us regions, but
is similar to that observed for parts of the immediate
early genes in RL (see above) and in Rs (unpublished
data).
The HSV-2 DNA sequence has a TATA box
candidate sequence aligned with that proposed by Chou
& Roizman (1986) to act in transcription initiation of the
HSV-1 RL1 gene (see Fig. 7), so the 5' end of the HSV-2
RL1 transcript may be adjacent to this. Similarly, the
HSV-2 transcript may terminate downstream of the
possible polyadenylation associated sequence ATTAAA
at residues 11881 to 11 876 (Fig. 3). Authentication of the
HSV-2 RL 1 transcript structure, including the proposed
intron, will require direct mapping analyses.
The HSV-2 RL1 gene is predicted to encode a protein
of 261 amino acids. Like its HSV-1 counterpart this
protein is basic with a high content of arginine residues.
The most similar region in the two proteins is near the C
terminus (corresponding to the second exon of HSV-2
RL1), in which 63 amino acids in each are aligned
without introduction of gaps, and show 8 3 ~ identity
(Fig. 7).
Discussion
The sequence data described in this paper for the genes
at the extremities of HSV-2 UL show that these regions of
the HSV-2 genome are very similar in sequence
organization and coding capacity to the corresponding
parts of HSV-1 DNA. This finding is similar to
published reports for a number of other genes in the
unique regions of the HSV-2 genome, both UL (for
example, genes ULI1 and UL12, Draper et al., 1986;
UL23, Swain & Galloway, 1983; UL27, Stuve et al.,
1987; UL30, Tsurumi et al., 1987; UL39, Swain &
Galloway, 1986; UL40, McLauchlan & Clements, 1983;
UL44 and UL45, Swain et al., 1985) and Us (for
example, genes US2 to US8, McGeoch et al., 1987). This
relationship is in notable contrast to that found for the
HSV-1 and HSV-2 RL regions. Here the sequences are
much more diverged, and also exhibit features distinct
from those in the unique regions, including a high
incidence of short reiterated families and other simple
sequences, and an elevated content of G and C residues.
These features can be largely accounted for by proposing
that a high level of recombination acts on the major
repeats of the genome and that there is an associated bias
towards raising the G + C content in mechanisms of
generation or fixation of mutations; this was discussed
previously with regard to the Rs element of HSV-1
(McGeoch et al., 1986). From comparisons of genome
structures of HSV and other alphaherpesviruses, it seems
probable that RL is the most recently evolved major
element of the HSV genome; and it is evidently, on the
time scale of the HSV-1/HSV-2 divergence, still in a state
of rapid change.
The amino acid sequences of HSV-2 UL and RL genes
obtained from our DNA sequence analyses have not
given direct new information on gene function, but they
have contributed a number of refinements to interpretations based on the HSV-1 sequence alone. For the UL
genes these include: the observation of possible signal
sequences for membrane-associated translation in UL1
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
Sequence o f the H S V - 2 long repeat
and UL3; the re-interpretation of a possible translation
initiation site for UL2; and the correction of the UL56
sequence with consequent identification of a possible
transmembrane segment.
The major aim of this paper was to evaluate the
functions of the HSV-I and HSV-2 RL elements by
comparative analyses. We consider that this has succeeded to the point that the potential of each part of RL
can be described at least partially. From the overall
divergence between the RL sequences, it is clear that a
given region will have remained closely similar in RL of
HSV-1 and HSV-2 only if it has a sequence-specific
function. In the following paragraphs features of the
various elements within RL are discussed in turn.
First, adjacent to UL there is a region, of 630 bp in
HSV-2 (Fig. 3), the sequence of which is not conserved
between HSV-1 and HSV-2, and which contains a high
level of repetitive and simple elements. In our view it is
possible that this DNA has been generated by aberrant
recombinational events and does not have a sequencespecific function.
Second, next to this divergent section lie the sequences
shown in Fig. 6, which exhibit extensive similarities in
HSV-1 and HSV-2. In part, these related sequences
probably represent functional elements in LAT transcriptional control. However, the similarities may be
judged to be more extended than would be reasonable for
only this function; there are no clues as to the nature of
additional roles.
Third, the region encoding the 5' part of the LAT,
outside the RL2 gene, is not at all conserved in sequence.
As outlined above, the LAT transcript probably is
generated as an intron. Its unusual stability presumably
represents some special features of the sequence, but
details of the structures involved remain unexplored. The
function of the LAT transcriptional unit is still not clear.
The stable LAT intron could be the important component; most straightforwardly, this could have the role
of helping to maintain the latent state by acting as an
antisense repressor of RL2 translation, as suggested by
several authors (and explored by Farrell et al., 1991).
Alternatively, the LAT may be a by-product and the
LAT 'exon transcript', of as yet uncharacterized coding
capacity, may be the functional entity. This obscurity is
c o m p o u n d e d b y t h e w o r k o f D o e r i g et al. (1991), s h o w i n g
;hat a p r o t e i n m a y be e x p r e s s e d f r o m p a r t o f t h e L A T
r e g i o n in l a t e n t l y i n f e c t e d n e u r o n s .
F o u r t h , t h e i m m e d i a t e e a r l y R L 2 g e n e is a c l e a r l y
defined entity; including upstream control sequences,
t h i s a c c o u n t s for s o m e 4300 b p o f RL. Last, t h e R L 1 g e n e
a c c o u n t s for t h e r e m a i n d e r o f RL to t h e a s e q u e n c e .
Regarding the structure of the HSV-2 RL1 gene, we
c o n s i d e r t h e i n t e r p r e t a t i o n o f its c o d i n g r e g i o n to be
r e a s o n a b l y secure, w i t h s o m e q u a l i f i c a t i o n r e g a r d i n g t h e
3073
5' terminus, which encodes basic, repetitive amino acid
sequences. Knowledge of the structure of HSV-2 RL1
transcripts is, however, still incomplete and needs
further work.
We thank V. G. Preston and A. J. Davison for provision of
HSV-2 HG52 clones, A. C. Minson and S. Efstathiou for HSV-2
DNAs, L. J. E. Kattenhorn for extensive help in preparing the text,
and S. M. Brown, A. J. Davison and A. R. MacLean for reviewing the
text.
References
ACKERMANN, M., CHOU, J., SARMIENTO, M., LERNER, R. A. &
ROIZMAN,B. (1986). Identification by antibody to a synthetic peptide
of a protein specified by a diploid gene located in the terminal
repeats of the L component of herpes simplex virus genome. Journal
of Virology 58, 843 850.
BANKIER,A. T. & BARRELL,B. G. (1989). Sequencing single-stranded
DNA using the chain-termination method. In Nucleic Acid Sequencing: A Practical Approach, pp 37-78. Edited by C. J. Howe & E. S.
Ward. Oxford & New York: IRL Press.
BATCHELOR, A. H. & O'HARE, P. (1990). Regulation and cell-typespecific activity of a promoter located upstream of the latencyassociated transcript of herpes simplex virus type 1. Journal of
Virology 64, 3269-3279.
CHOU, J. & ROIZMAN,B. (1986). The terminal a sequence of the herpes
simplex virus genome contains the promoter of a gene located in the
repeat sequences of the L component. Journal of Virology 57,
629-637.
CHOU, J. & ROIZMAN,B. (1990). The herpes simplex virus 1 gene for
ICP34.5, which maps in inverted repeats, is conserved in several
limited-passage isolates but not in strain 17syn +. Journal of Virology
64, 1014-1020.
CHOU, J., KERN, E. R., WHITLEY, R. J. & ROIZMAN, B. (1990).
Mapping of herpes simplex virus-1 neurovirulence to Y134.5, a gene
nonessential for growth in culture. Science 250, 1262-1265.
CURRAN, J. & KOLAKOFSKY,D. (1988). Ribosomal initiation from an
ACG codon in the Sendai virus P/C mRNA. EMBO Journal 7,
245 251.
DAVISON,A. J. & WILKIE, N. M. (1981). Nucleotide sequences of the
joint between the L and S segments of herpes simplex virus types 1
and 2. Journal of General Virology 55, 315-331.
DEVEREUX,J., HAEBERLI,P. & SMITHIES,O. (1984). A comprehensive
set of sequence analysis programs for the VAX. Nucleic Acids
Research 12, 387-395.
DOBSON, A. T., SEDERATI, F., DEvI-RAo, G., FLANAGAN, W. M.,
FARRELL, M. J., STEVENS,J. G., WAGNER,E. K. & FELDMAN,L. T.
(1989). Identification of the latency-associated transcript promoter
by expression of rabbit beta-globin mRNA in mouse sensory nerve
ganglia latently infected with a recombinant herpes simplex virus.
Journal of Virology 63, 3844-3851.
DOERIG, C., PIZER, L. I. & WILCOX,C. L. (1991). An antigen encoded
by the latency-associated transcript in neuronal cell cultures latently
infected with herpes simplex virus type 1. Journal of Virology 65,
2724-2727.
DRAPER,K. G., DEVI-RAo, G., COSTA,R. H., BLAIR,E. D., THOMPSON,
R. L. & WAGNER, E. K. (1986). Characterization of the genes
encoding herpes simplex virus type 1 and type 2 alkaline
exonucleases and overlapping proteins. Journal of Virology 57,
1023 1036.
EVERETT, R. D. (1988). Analysis of the functional domains of herpes
simplex virus type 1 immediate-early polypeptide Vmw110. Journal
of Molecular Biology 202, 87-96.
EVERETT,R. D., PRESTON,C. M. & STOW,N. D. (1991). Functional and
genetic analysis of the role of Vmwll0 in herpes simplex virus
replication. In Herpesvirus Transcription and its Regulation, pp 49-76.
Edited by E. K. Wagner. Boca Raton: CRC Press.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
3074
D. J. McGeoch and others
FARRELL, M. J., DOBSON, A. T. & FELDMAN, L. T. (1991), Herpes
simplex virus latency-associated transcript is a stable introD.
Proceedings of the National Academy of Sciences, U.S.A. 88, 790794.
FREEMONT, P. S., HANSON, 1. M. & TROWSDALE, J. (1991). A novel
cysteine-rich sequence motif. Cell 64, 483-484.
GUPTA, K. C. & PATWARDHAN,S. (1988). ACG, the initiator codon for
a Sendai virus protein. Journal of Biological Chemistry 263,
8553-8556.
LEIB, D. A., BOGARD,C. L., KOsZ-VNENCHAK, M., HICKS, K. A., COEN,
D. M., KNIPE, D. M. & SCHAFFER, P. A. (1989). A deletion mutant of
the latency-associated transcript of herpes simplex virus type 1
reactivates from the latent state with reduced frequency. Journal of
Virology 63, 2893-2900.
LITTLE, S. P. & SCHAFFER,P. A. (1981). Expression of the syncytial
(syn) phenotype in HSV-1, strain KOS: genetic and phenotype
studies of mutants in two syn loci. Virology 112, 686-702.
MCGEOCH, D. J., DOLAN, A., DONALD, S. & RIXON, F. J. (1985).
Sequence determination and genetic content of the short unique
region in the genome of herpes simplex virus type 1. Journal of
Molecular Biology 181, 1-13.
McGEOCH, D. J., DOLAN, A., DONALD, S. & BRAUER, D. H. K. (1986).
Complete DN A sequence of the short repeat region in the genome of
herpes simplex virus type 1. Nucleic Acids Research 14, 1727-1745.
McGEOCH, D. J., MOSS, H. W. M., MCNAB, D. &FRAME, M. C. (1987).
DNA sequence and genetic content of the HindlIl I region in the
short unique component of the herpes simplex virus type 2 genome:
identification of the gene encoding glycop/-otein G, and ev01utionary
comparisons. Journal of General Virology 68, 19-38.
MCGEOCH, D. J., DALRYMPLE, M. A., DAVISON, A. J., DOLAN, A.,
FRAME, M. C., MCNAB, D., PERRY, L: J., SCOTT,J. E. & TAYLOR,P.
(1988). The complete DNA sequence of the long unique region in the
genome of herpes simplex virus type 1. Journalof General Virology69,
1531-1574.
MACKEM, S. & ROIZMAN, B. (1982). Structural features of the herpes
simplex virus c~ gene 4, 0 and 27 proinoter-regulatory sequences
which confer regulation on chimeric thymidine kinase genes. Journal
of Virology 44, 939-949.
MCLAUCHLAN, J. & CLEMENTS, J. B. (1983). DNA sequence homology
between two colinear loci on the HSV genome which have different
transforming abilities. EMBO Journal 2, 1953 1961.
MITCHELL, W. J., DESHMANE, S. L., DOLAN, A., MCGEOCH, D. J. &
FRASER, N. W. (1990a). Characterization of herpes simplex virus
type II transcription during latent infection of mouse trigeminal
ganglia. Journal of Virology 64, 5342-5348.
MITCHELL, W. J., LIRETTE, R. P. & FRASER, N. W. (1990b). Mapping of
low abundance latency-associated RNA in the trigeminal ganglia of
mice latently infected with herpes simplex virus type 1. Journal of
General Virology 71, 125 132.
MIZUSAWA,S., NISHIMURA,S. & SEELA, F. 0986). Improvement of the
dideoxy chain termination method of DNA sequencing by use of
deoxy-7-deazaguanosine triphosphate in place of dGTP. Nucleic
Acids Research 14, 1319-1324.
MOUNT, S. M. (1982). A catalogue of splice junction sequences. Nucleic
Acids Research 10, 459-472.
MULLANEY, J., Moss, H. W. MCL. & MCGEOCH, D. J. (1989). Gene
UL2 of herpes simplex virus type 1 encodes a uracil-DNA
glycosylase. Journal of General Virology 70, 449-454.
PERRY, L. J. & McGEOCH, D. J. (1988). The DNA sequences of the
long repeat region and adjoining parts of the long unique region in
the genome of herpes simplex virus type 1. Journal of General
Virology 69, 2831-2846.
PERRY, L. J., RIXON, F. J., EVERETT, R. D., FRAME, M. C. &
MCGEOCH, D. J. (1986). Characterization of the I E l l 0 gene of
herpes simplex virus type 1. Journal of General Virology 67,
2365-2380.
ROCK, D. L., NESBURN, A. B., GHIASI, H., ONG, J., LEWIS, T. L.,
LOKENSGARD, J. R. & WECHSLER, S. L. (1987). Detection of latencyrelated viral R N A s in trigeminal ganglia of rabbits latently infected
with herpes simplex virus type 1. Journal of Virology 61, 3820-3826.
SAIKI, R. K., GELFAND, D. H., STOFFEL, S., SCHARF, S. J., HIGUCHI, R.,
HORN, G. T., MULLIS, K. B. & EHRLICH, H. A. (1988). Primer
directed enzymatic amplification of DNA with a thermostable DNA
polymerase. Science 239, 487-491.
SPIVACK, J. G. & FRASER, N. W. (1987). Detection of herpes simplex
virus type 1 transcripts during latent infection in mice. Journal of
Virology 61, 3841-3847.
STADEN, R. (1982). Automation of the computer handling of gel reading
data produced by the shotgun method of DNA sequencing. Nucleic
Acids Research 10, 4731 4751.
STEINER, I., SPIVACK,J. G., LIRETTE, R. P., BROWN, S. M., MACLEAN,
A. R., SUBAK-SrIARPE,J. H. & FRASER, N. W. (1989). Herpes simplex
virus type 1 latency-associated transcripts are evidently not essential
for latent infection. EMBO Journal 8, 505-511.
STEVENS, J. G., WAGNER, E. K., DEvI-RAo, G. B., COOK, M. L. &
FELOMAN, L. T. (1987). RNA complementary to a herpesvirus alpha
mRNA is prominent in latently infected neurons. Science 235,
1056-1059.
STUVE, L. L., BROWN-SHIMER,S., PACHL, C., NAIARIAN, R., DINA, D. &
BURKE, R. L. (1987). Structure and expression of the herpes simplex
virus type 2 glycoprotein gB gene. Journal of Virology 61, 326-335.
SWAIN, m. A. & GALLOWAY,D. A. (1983). Nucleotide sequence of the
herpes simplex virus type 2 thymidine kinase gene. Journal of
Virology 46, 1045-1050.
SWAIN, M. A. & GALLOWAY, n . A. (1986). Herpes simplex virus
specifies two subunits of ribonucleotide reductase encoded by
Y-coterminal transcripts. Journal of Virology 57, 802-808.
SWAIN, M. A., PEET, R. W. & GALLOWAY, n . A.:(1985). Chai'acterization of the gene encoding herpes simplex virus type 2 glycoprotein C
and comparison with the type 1 counterpart. Journal of Virology 53,
561-569.
TAHA, M. Y., CLEMENTS,G. B. & BROWN, S. M. (1989"a). A variant of
herpes simplex virus type 2 strain HG52 with a 1.5 kb deletion in RL
between 0 to 0.02 and 0.81 to 0.83 map units is non-neurovirulent for
mice. Journal of General Virology 70, 705 716.
TAHA, M. Y., CLEMENTS, G. B. & BROWN, S. M. (1989b). The herpes
simplex virus type 2 (HG52) variant JH2604 has a 1488 bp deletion
which eliminates neurovirulence in mice. Journalof General Virology
70, 3073-3078.
TAHA, M. Y., BROWN, S. M., CLEMENTS, G. B. & GRAHAM,D. I. (1990).
The JH2604 deletion variant of herpes simplex virus type 2 (HG52)
fails to produce necrotizing encephalitis following intracranial
inoculation of mice. Journal of General Virology 71, 1597-1601.
TAYLOR, P. (1986). A computer program for translating DNA
sequences into protein. Nucleic Acids Research 14, 437-441.
TSURUMI,T., MAENO,K. & NISHIYAMA,Y. (1987). Nucleotide sequence
of the DNA polymerase gene of herpes simplex virus type 2 and
comparison with the type 1 counterpart. Gene 52, 129-137.
WAGNER, E. K., DEvI-RAO, G., FELDMAN, L. T., DOBSON, A. T.,
ZHANG, Y.-F., FLANAGAN,W. M. & STEVENS,J. G. (1988a). Physical
characterization of the herpes simplex virus latency-associated
transcript in neurons. Journal of Virology 62, 1194-1202.
WAGNER, E. K., FLANAGAN, W. M., DEvI-RAo, G., ZHANG, Y.-F.,
HILL, J. M., ANDERSON, K. P. & STEVENS, J. G. (1988b). The herpes
simplex virus latency-associated transcript is spliced during the
latent phase of infection. Journal of Virology 62, 4577-4585.
WECHSLER, S. L., NESBURN, A. B., WATSON, R., SLANINA,S. & GHIASI,
H. (1988a). Fine mapping of the major latency-related RNA of
herpes simplex virus type 1 in humans. Journal of General Virology
69, 3101-3106.
WECHSLER, S. L., NESBURN,A. B., WATSON, R., SLANINA,S. & GHIASI,
H. (1988b). Fine mapping of the latency-related gene of herpes
simplex virus type 1 : alternative splicing produces distinct latencyrelated RNAs containing open reading frames. Journal of Virology
62, 4051-4058.
WHITTON, J. L. & CLEMENTS, J. B. (1984). The junctions between the
repetitive and the short unique sequences of the herpes simplex virus
genome are determined by the polypeptide-coding regions of the two
spliced immediate-early mRNAs. Journal of General Virology 65,
451-466.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01
Sequence of the HSV-2 long repeat
WHrvroN, J. L., RIXON, F. J., EhSTON, A. J. & CLEMENt'S,J. B. (1983).
Immediate-early mRNA-2 of herpes simplex viruses types 1 and 2 is
unspliced: conserved sequences around the 5" and 3" termini
correspond to transcription regulatory signals. Nucleic Acids
Research 11, 6271-6287.
WORRAD,D. M. & CARADONNA,S. (1988). Identification of the coding
sequence for herpes simplex virus uracil-DNA glycosylase. Journalof
Virology 62, 4774-4777.
3075
ZWAAGSTRA, J. C., GHIASI, H., SLANINA, S. M., NESBURN, A. B.,
WHEATLEY, S. C., LILLYCROP, K., WOOD, J., LATCHMAN,D. S.,
PATEL, K. & WFCHSLER,S. L. (1990). Activity of herpes simplex
virus type 1 latency-associated transcript (LAT) promoter in neuronderived cells: evidence for neuron specificity and for a large LAT
transcript. Journal of Virology 64, 5019-5028.
(Received 12 June 1991; Accepted 13 August 1991)
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 10 May 2017 20:36:01