* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Comparative sequence analysis of the long repeat regions and
Frameshift mutation wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
DNA vaccination wikipedia , lookup
Minimal genome wikipedia , lookup
Genome (book) wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genomic library wikipedia , lookup
Transposable element wikipedia , lookup
Gene expression profiling wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genetic code wikipedia , lookup
Primary transcript wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene desert wikipedia , lookup
Microevolution wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Multiple sequence alignment wikipedia , lookup
Designer baby wikipedia , lookup
Genome evolution wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Human genome wikipedia , lookup
Non-coding DNA wikipedia , lookup
Microsatellite wikipedia , lookup
Sequence alignment wikipedia , lookup
Metagenomics wikipedia , lookup
Point mutation wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Genome editing wikipedia , lookup
Journal of General Virology (1991), 72, 3057 3075. Printedin Great Britain 3057 Comparative sequence analysis of the long repeat regions and adjoining parts of the long unique regions in the genomes of herpes simplex viruses types 1 and 2 Duncan J. McGeoch*, Charles Cunningham, Graham Mclntyre and Aidan Dolan M R C Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G l l 5JR, U.K. We report the determination of the D N A sequence of the long repeat (RL) region and adjacent parts of the long unique (UL) region in the genome of herpes simplex virus type 2 (HSV-2) strain HG52. The D N A sequences and genetic content of the extremities of HSV-2 UL were found to be closely similar to those determined previously for HSV-1. The 5658 bp sequenced at the left end of HSV-2 UL contained coding regions for genes UL1 to UL4 plus part of UL5. The 4355 bp sequenced at the right end Of UL contained coding regions for part of gene UL53, and the whole of genes UL54 to UL56. Comparison of the HSV-1 and HSV-2 UL56 sequences led to a correction in the published HSV-1 UL56 reading frame. The HSV-2 R L region, including one copy of the a sequence, was determined to be 9263 bp, with a base composition of 75-4% G+C and with many repetitive sequence elements. In HSV-2 RL, sequences were identified corresponding to HSV-1 genes encoding the immediate early IE110 (ICP0) transcriptional regulator and the ICP34.5 neurovirulence factor; the former HSV-2 gene was proposed to contain two introns, and the latter one intron. Downstream of the HSV-2 immediate early gene, the RE sequence encoding the latency-associated transcripts (LATs) was found to be dissimilar to that in HSV-1; the probable L A T promoter regions, however, showed similarities to HSV-1. Properties of the LAT sequences in both HSV-1 and HSV-2 were consistent with LATs being generated as an intron excised from a longer transcript. Introduction tations (see Fig. 1). One well characterized gene was recognized in R L, encoding the immediate early transcriptional regulatory protein IE 110 or ICP0; this gene is flanked by substantial sequences the roles of which were less well defined. Downstream of the IE110 gene is a region of some 3500 bp which has not been assigned any protein coding function but which is the major locus of transcription in neurons latently infected with HSV-1, giving rise to R N A species termed latency-associated transcripts (LATs) (Stevens et al., 1987; Rock et al., 1987; Spivack & Fraser, 1987; Wagner et al., 1988a, b). The function of these RNAs remains obscure, although it has been observed that some HSV-1 variants defective in LAT expression show impaired reactivation from latency in animal models (Leib et al., 1989; Steiner et al., 1989). On the other side of the IE110 gene is a region in which Chou & Roizman (1986) and Ackermann et al. (1986) have mapped sequences encoding a protein termed ICP34.5 in HSV-1 strain F, and identified a candidate open reading frame (ORF) for ICP34.5. In the HSV-1 strain 17 sequence, however, this ORF was not conserved and there was no satisfactory alternative We have previously described the genomic D N A sequence of herpes simplex virus type 1 (HSV-I) strain 17 (McGeoch et al., 1985, 1986, 1988; Perry & McGeoch, 1988). The 152 kbp sequence was interpreted as containing 70 distinct genes of which two, located in the major repeat elements of the genome, were present in two copies each. Many of the proposed genes had no significant previous characterization and further studies will be necessary to authenticate them fully. Nevertheless, we considered the interpretation to be generally convincing, and in most of the genome to leave little room for additional genes or other functional sequences. The major exception to this evaluation was the long repeat region (RL; Perry & McGeoch, 1988). This 9 kbp element is present in two copies (TRL and IRL) which flank the long unique region (UL) in opposing orienThe nucleotidesequencedata reported in this paper will appear in the EMBL,DDBJ and GenBanknucleotidesequencedatabasesunder the accessionnumbers D01127 and D01128. 0001-0427 © 1991 SGM Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 3058 D. J. M c G e o c h and others (Perry & McGeoch, 1988). This conflict has recently been resolved by revision of both the strain F and strain 17 sequences (Chou & Roizman, 1990; Chou et al., 1990; A. Dolan, E. McKie, A. R. MacLean & D. J. McGeoch, unpublished data), adding a further gene to the complement recognized for strain 17. I C P 3 4 . 5 appears to be an important determinant of viral neurovirulence in HSV-1 (Chou et al., 1990), and in HSV-2 a neurovirulence determinant maps to an equivalent region of the genome (Taha et al., 1989a, b, 1990). The present paper describes the determination of the D N A sequence for the RL element of HSV-2 strain HG52, together with adjacent parts of UL. The major objective of this work was to compare the HSV-1 and HSV-2 sequences in order to gain better insight into the functional roles of RL; the HSV-2 sequence analysis also yielded detailed information on the counterparts of the I E l l 0 gene, the ICP34.5 gene and nine genes in UL. Results Nomenclature We previously introduced a general nomenclature for HSV-1 genes in the short unique region (Us) (McGeoch et al., 1985) and UL (McGeoch et al., 1988) by numbering genes in each region (US1 to US12, UL1 to UL56); this freed gene names from reference to proposed functions, Mrs of encoded proteins and expression characteristics. We believe that it is now useful to have a corresponding nomenclature for genes in RL: we refer to the HSV-1 gene encoding I C P 3 4 . 5 (Chou & Roizman, 1990) as RL1, and to the immediate early gene encoding I E l l 0 (Vmw110, ICP0) as RL2. In this paper we use these names for HSV-1 genes and their HSV-2 equivalents. I f the L A T transcription unit (see below) is ever assigned a protein-coding function, it could be regarded as RL3. Determination o f the sequences o f H S V - 2 R L and adjoining regions Methods DNA sequence determination. Four plasmid-cloned fragments of HSV-2 strain HG52 DNA were used for sequence determination: BamHIf(cloned in pAT153; Whitton et al., 1983), BamH1 g (cloned in pAT153, from A. J. Davison) and BamHI p and BamHl c (cloned in pUCI8 for this study). HSV-2 inserts were recovered by BamHI digestion and agarose gel electrophoresis, fragmented by sonication and cloned into the SmaI site of M13mp8. Sequences of the M13 clones were generated by chain termination methods (Bankier & Barrell, 1989). 7-deaza-2'-deoxyguanosine5'-triphosphate was generally substituted for dGTP (Mizusawa et al., 1986). Sequences were compiled using the program set of Staden (1982). Regions presenting problem sequences were resolved using electrophoresis in a 6% polyacrylamidegel containing 9 M-urea, with a water jacket maintained at approximately 80 to 85 °C. Some use was also made of Taq DNA polymerase (Promega) and Bst DNA polymerase (Bio-Rad) for elongation reactions at 70 °C. For more than 95 % of each sequence, data were obtained for both strands. Sequences across BamHI sites representing boundaries between adjacent plasmid-cloned fragments were obtained using the polymerase chain reaction (PCR) with Taq DNA polymerase (Saiki et al., 1988) and Vent DNA polymerase (New England Biolabs). Using genomic DNA as template with suitable oligonucleotide primers, DNA fragments across the BamHI sites were amplified, cloned into M13mp9 and sequenced. DNA sequence interpretation. The GCG program set was used for analysis of sequences (Devereux et al., 1984). The program PTrans (Taylor, 1986)was used to prepare the listings shown in Fig. 2, 3 and 7. Numbering of DNA sequences. The HSV-1 strain 17 sequence in Fig. 4, 6 and 7 is numbered acccording to McGeoch et al. (1988), with changes imposed by corrections at two loci. First, the coding region of the ICP34.5 gene in RL was corrected by deletion of residues 823 and 824 in TRL, and the corresponding residues in IRL (125547 and 125548) (A. Dolan, E. McKie, A. R. MacLean & D. J. McGeoeh, unpublished data). Second, results in this paper correct the UL56 coding sequence by addition of two residues after residue 116343. The net effect of these changes is that the numbering for the region of HSV-1 RL in Fig. 6 remains unchanged, and in Fig. 7 changes at residue 125547. We have determined the sequences of two sections of HSV-2 strain H G 5 2 D N A : a region containing the left end of UL and the TRL/UL junction (listed in Fig. 2), and a region containing the right end of UL, the whole of IRL and part of IRs (Fig. 3). As shown in Fig. 1, the region of HSV-2 D N A represented by B a m H I fragments f, p and g encompasses the whole internal copy of RL (IRL), together with adjacent parts of UL and IRs. Analysis by P C R of the sequences across the B a m H I sites between f and p, and between p and g, showed that fragments p and g were contiguous in the genome but that a previously unsuspected 9 nucleotide sequence lay between the B a m H I sites at the neighbouring termini of f and p; this latter result was obtained with seven M 13 clones made in two separate P C R amplification experiments. The whole sequence determined comprised 16465 nucleotides of composition 7 4 . 1 ~ G + C . Comparison with the sequence data of Davison & Wilkie (1981) for the HSV-2 'joint' region (the junction of IRL and IRs) showed that 2847 bp of B a m H I g lay in IRs; this part is not dealt with in the present paper. The left 13618 bp of the sequence represent the right extremity of UL and the whole of IRL, and this part is listed in Fig. 3. We also sequenced part of the B a m H I c sequence, which runs from TRL into the left end of UL, and this is listed in Fig. 2. Not all of the TRL part of B a m H I c was determined. Comparison of the sequences in Fig. 2 and 3 located the TRL/UL and UL/IRL boundaries: the left end of UL is at residue 172 in Fig. 2 and the right end is at residue 4355 in Fig: 3. The version of IRL listed in Fig. 3, including one copy of the a sequence (see Fig. 1), Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 Sequence of the HSV-2 long repeat TRy_ II IRL IRs UI I I II a' a Us I 3059 TRs I II ll [- ] a _-'1 UL1 UL3 UL2 UL4 II_ UL5 --..1~ UL53 v-'V" UL54 | "''~ ~ t t 5 6 UL55 RL2 RL1 LAT I X A f 11 kbp A P g Fig. 1. Organization of the HSV genome around the long repeat elements. The top part of the figure shows an outline arrangement of the major elements in the genomes of HSV-1 and HSV-2. UL and Us are bounded by pairs of inverted repeats (TRL and IRL; IRs and TRs). There is a terminally redundant element (the a sequence); at least one copy of this is present in inverted orientation at the IRL/IRs boundary (a'). The lower sections of the figure indicate locations of genes of HSV-2 adjacent to the ends of UL and in RL, as listed in Fig. 2 and 3. Coding regions of genes are indicated by solid arrows, with introns shown in genes RL1 and RL2 (see text). The location of the LAT is shown by a dashed arrow, with introns or possible introns not marked (see text). The gene arrangements are also valid for HSV-1 (McGeoch et al., 1988) with the exception that the intron shown in RL1 is specific to HSV-2. The locations of HSV-2 B a m H I fragments f, p and g are indicated at the bottom of the figure. The 1 kbp scale marker applies to both expanded sections. contains 9263 bp of composition 75-4~ G + C . The sequences determined at the left and right ends of UL contained 5658 and 4355 bp, of composition 63.1~ and 66-2~ G + C respectively. Organization of the HS V-2 genome adjacent to the ends of UL and in R L Fig. 4 illustrates the overall relationship between these HSV-2 DNA sequences and their counterparts in HSV-1 in four two-dimensional comparative plots, and shows that the two parts of HSV-2 UL sequenced are closely similar to and collinear with the corresponding parts of HSV-I UL, whereas the RL sequences are generally more divergent. Alignments using the GCG program Bestfit showed 80.2~ and 75.5~ identity between the HSV-1 and HSV-2 sequences for the left and right portions of UL, respectively. The HSV-2 UL sequences contain ORFs which correspond closely to the gene organization proposed for HSV-1 (Perry & McGeoch, 1988; McGeoch et al., 1988). The left part of HSV-2 UL contains equivalents of HSV- 1 genes UL1 to UL4 and part of UL5; the right part of UL contains equivalents of UL54 to UL56 and part of UL53 (see Fig. 1). Tables 1 and 2 summarize information on the locations of genes in the HSV-2 sequences, and on properties of the encoded proteins, respectively; data on the two protein coding genes recognized in RL are also included (see below). R L contains a number of sets of short, tandemly reiterated elements and other 'simple' sequences. Prominent tandem repeat sets are indicated in Fig. 2 and 3 as families 1 to 7. The junctions between TRL and UL, and between UL and IRL are defined by the occurrence of family 1 at the extremity of RL. In the case of the IRL clone sequenced (Fig. 3), this arrangement is a little obscured by the fact that the repeat family occurs in a minimal version of one complete plus one partial copy; however, in the TRL clone there are three complete copies and one partial copy (Fig. 2). At each U L / R L junction, on the UL side of the junction there is a TATA box-like sequence. The organization of the HSV-2 U L / R L junctions is thus very similar to that found previously for HSV-1 (Perry & McGeoch, 1988), for which it was suggested that the TATA box sequences were perhaps functional in expressing genes UL1 and UL56. Aspects of sequenced genes adjacent to the left end of HSV-2 UL The amino acid sequences encoded by the nine HSV-2 UL genes wholly or partly sequenced are closely similar to their HSV-1 counterparts (see Table 2), and we shall not discuss them in detail. This and the following section treat some points on the structure and function of the UL genes which arose from evaluation of the HSV-2 data. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 3060 D. J. McGeoch and others GCGCGGCGGCCGGGCGGGGGCGCGCGGCGGCCGGGCGGC~GGCGCGCGGCGGCCGGGCGGGGGCGCGCTTTCCCCGCGTCGCCCCTCGGGTTCCCAAGACCTATCACGTGTGCGCAGGGGA • --/\ . . . . . . . . . . . . . . . . . . . Family 2 I\ . . . . . . . . . . . . . . . . . . . I\ . . . . . . . . . . . . . . . . . . . 120 / .... I E n d of T R L - - - > < - - - S t a r t of U L GGGGAGGACGCGGGGGAGGGGAGGACGCGGGGGAGGGGAGGACGCGGGGGATATATAAAGCGGTAGAAAGCGCGGGAATGGGCATATTGGACCCGCGTGATTCGGTTGCTCGCGGTTGTC \ ............... Family I\ ............... I\ ............... ULI M G F V C L F G L V V M G A W G A W ACTGTCCGCTCGCTATGGGGTTCGTCTGTCTGTTTGGGCTTGTCGTTATGGGAGCCTGGGGGGCGTGGGGTGGG G G 20 360 V P S TCACAGGCAACCGAATATGTTCTTCGTAGTGTTATTGCCAAAGAGGTGGGGGACATACTAAGAGTGCCTTGCATGCGGACCCCCGCGGACGATGTTTCTTGGCGCTACGAGGccCCGTCC 60 480 V I D Y A R I D G I F L R Y H C P G L D T F L W D R H A Q R A Y L V N P F GTTATTGACTATGCCCGCATAGACGGAATATTTCTTCGCTATCACTGCCCGGGGTTGGACACGTTTTTGTGGGATAGGCACGCCCAGAGGGCGTATCTGGTTAACCCCTTTCTCTTTGCG L F A 100 600 S H A P V R A G C V N F D Y S R T R R C V G R R D L R P A N T T S T W E P AGCCACGCACCCGTCAGGGCCGGGTGTGTAAACTTTGACTACTCACGCACTCGCCGCTGCGTCGGGCGACGCGATTTACGGCCTGCCAACACCACGTCAACGTGGGAACCGCCTGTGTCG P V S 180 840 S D D E A S S Q S K P L A T Q P P v L A L S N A P P R R V S P T R G R R R TCGGACGATGAAGCGAGCTCGCAGTCGAAGCCCCTCGCCACCCAGCCGCCCGTCCTCGCCCTTTCGAACGCCCCCCCACGGCGGGTCTCCCCGACGCGAGGTCGGCGCCGGCATACTCGC H T R 220 960 TTGTTTGGACGTTTTTTATGCGGGAACAAGGGGGCTTACCGOTTAC S UL2 240 I ........ 1 Q A T E Y V L R S V I A K E G D I L R V P C M R T P A D D V S W R Y E A M F S A S T T P E Q P L G L S G D A T P P L P T S V P L D W A A F R R CGTGATGTTTTCCGCATCTACGACCCCCGAACAGCCCCTGGGGCTGTCGGGCGATGCGACGCCGCCCCTGCCGACTTCCGTGCCCCTGGACTGGGCCGCGTTTCGGCGCGCGTTTCTGAT D D A W R P L L E P E L A N P L T A R L L A E Y D R CGACGACGCCTGGCGGCCC C TGTTGGAGCCGGAGCTCGCGAACCCCCTAACCGCGCGCCTCCTCGCGGAGTATGACCGTCGG N R Y L E T R D I M P I D W S V R C Q T E E V L TGCCAGACCGAAGAGGTGCTGC - UL3 GAATCGC TAC CTCGAAACC CGGGACATTATGC CGATCGAC TG G TC GGTATAAGATGCCGACATC CG GGGTC TTGATTTAC GAGGGGGCAATTAATAAAGAC A I 39 1200 P P R E D V CGCCGCGGGAGGATGT 79 1320 T T P I E S I A G T A P D A H V G P L D G E P D R D A I S P L T L 255 M V K S R 5 TG T TGATGGTTAAATC TCG 1920 V S Y R S v M S G V G E E R V P S A F T I L A 8 W G W T F A P Q N H D L GGTCTCATACCGGTCCGTGATGTCGGGCGTGGGGGAAGAGAGGGTCCCCTCTGCGTTTACTATCCTTGCCTCGTGGGGCTGGACGTTTGCACCCCAGAACCATGATCTGGCGCGCTCGCC N F S S V A R S P 45 2040 A G D P 85 2160 S L Q M 165 2400 GAATACGACGCCCKTAGAGTCGATTGCGGGGACCGCACCGGACGCGCACGTGGGGCCTCTCGACGGAGAGCCGGACCGGGATGCGATCTCCCCGCTTAcGTCGAGCGTGGCCGGCGACCC K F S I A C T K T S S F S G T A A R Q R K R G A P P Q R T C V P R S N K CAAGTTCAGCATCGCGTGTACCAAGACCTCGTCGTTTTCGGGGACGGCCGCGCGCCAGCGCAAGCGCGGAGCACCGCCGCAACGCACATGCGTACCACGCAGCAACAAGAGCCTCCAGAT UL4 CCCTTCCCCCGTTACTGATGTGTTGTACGTTTCAATAAATAACACGTAGCTTATTTTGTTGGATGATGGATTGATTGATTTTATTGACCGTTCGT~C~C~GGCGG~CG~c~G~c 2760 GCAGAGGGAATATGCAAGCGGGCGGGGTGGG 2880 178 GAG GAAAGAAG GTTTCAGGT TC CGGGG GTTGG GTCTGC GTCGTC CAGG GTGGGGC TGATC TGAATTTCC C GCAGAAC C TCGACCAGTAG T G P T P D A D D L T P S I Q I E R L V E V L L GTCTGTTGTGTTTGCTGGGAACTCGCCCGCCGTTGGGGATACGGGGGCGGGGGGTGTGGTCGGGCGGACGTCCAGGGGTGCGTTATCGCACCCCCGCGCCGCCTcGGGGGCCGTCcCGTA D T T N A P F E G A T P S V P A P P T T P R V D L P A N D C G R A A E P A T G Y 3000 138 GATCGTTGCGGTGATGTAGATGGTGTCCGGGGTCCACACCACCGTCAGGATGCCGGCCGTCGCACTCCGGACGCTTTCGCCGTGCGATGAGCTGACCCAGGAGTCAAAGGGGTACGCGTA I T A T I Y I T D P T W V V T L I G A T A S R V S E G H S S S V W S D F P Y A Y 3120 98 CATATGGGCGTCCCACCAGCGCTCCAGCCTCTGGGTACTAGCGCGTCCTATAAAGCGGTATGCGCAAAATTCGGCACGACAGTCGATAATCACCAGCAGcCCGATGGGGGTGTGTTGTAT M H A D W W R E L R Q T S A R G I F R Y A C F E A R C D I I V L L G I P T H Q I 3240 58 CACCACGCCTCCGCGGGGCAGGCGGTCCTGGCGCGCTCGACCCCGCGTCAGAACCGCGcGCGTCCCTGACTCAAACACGTGCACCACCTGTGCCGCGTCCGGCAGCGCGCTCGTTAGCGA v V G G R P L R D Q R A R G R T L V A R T G S E F V H V V Q A A D P L A S T L S 3360 18 CGCCCTGGGGTGATGTAGGCTGTACGCGATGGTCGTCTGGGGGTTCCCCATG A R P H H L S Y A I T T Q P N G UL5 ATAGACAATGACCACAT TCGGATCGCGTAGAGCAGATAGTATG Y V I V V N P D R L A S L I TCTCGGGGGGGTGGGGGTGAATGTCACCCGGCCCGGGTGCGGTGGGAACGCGAGGGAATGGAGGGTTA 3480 1 M TGC TCGCTAATGACG TCATCGCGT TCG TGGC GC TC C CGGAGCGGGTTTAGATTCATG H E S I V D D R E H R E R L P N L N M TGCAGGAAC TCGGATGAGGT H L F E S S T GGTGCGGGACATGGCTACGTACGCGCTGTTTAGGCGCAGGTTTCCGGGCGTGAAGCATATGGCGACCTTGTCCAGACTGAGCCCCTGGGAGCGCGTGATGGTCATCGcGAGTTTGGAGCT T R S M A V Y A S N L R L N G P T F C I A V K D L S L G Q S R T I T M A L GATGCCGTAG TCGGCGTTG ATGGCCATGGCCAGCTC CGTGGAG TCGATCGAC TCGACAAACTCACTGATGT TGGTATTGACGACAGACATGAAGC I G Y D A N I A M A L E T S D I S E V F E S I N T N V V S M F GGGCAGGGGGGACTCCTCCAAGAACTCGGCCACGCCGGCCGTCGCGTGCCGCCGCCGCAGCTC P L P S E E L F E A V G A T A H R R R L E G 3720 K S C GTGC TGG TCCCGCAGGACGATG H Q D R L V I CTCCGCGAACGCGAACACCCGGGTGTACGTGTACCCCATCAGCGTGTAGTTGTCCGT E A F A F V R T Y T Y G M L T Y N D S TA Y 3840 3960 T CTGCAGGGCCACGGACATCAGCCCCCCGCGCGGCGAGCCGGTCAGCAGCTCGCAGCCCCGGAAAATGACATTGTCCACGTAGGTGCTGAAGGGGGCGCTCTCAAACACCTCCCCGAAGAG Q L A V S M L G G R P S G T L L E C G R F I V N D V Y T S F P A S E F V E G F L CTCCCGTAGGATAAGGTATCGCCCCAGAAAGGCCCTCTTCAGGAGCCCAAACTGGGCGTGGAcGGCCGCGGTGGTCTCAGGCTCTTCGAGGGCGTAGTGGCAGTAGAACACGTCCAGCTG E R L I L Y R G L F A R K L L G F Q A H V A A T T E P E E L A Y H C Y F V D L Q Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 3600 4080 4200 Sequence of the HSV-2 long repeat 3061 cTGTTcG~cAGCcCGGcGAAGATAAcGTcAAGGTCGTCGTCGGGGAAGTcGTcCGGGcCCCCGTCCCGCGGGCCCAGGTGCTT~TTGAAcGCAcGCTcCCCCGGAGAGCGGTCGCT Q E D L G A F I V D L D D D P F D D P G G D R P G L H K F N F A R E G P S R D S 4320 GGTGTcGGcGGccC~GTTGcCGATGCGCcGGcGGCGTCcCGGCGTAGCGACAGGAGTTCTGCCGTCAGC~CCCTAGGCGGCCGTAGGCCAGGGTCCTCTGGGTCGCGTCCAGGCCGGG T D A A R T A S A G A A D R R L S L L E A T L E G L R G Y A L T R Q T A D L G P 4440 GCGc~GAG~AGTTGTAAAAGTGAATcAGCCC~CGAACATGAGCCGCGACAGGAACCGGTAGGCGAACTCCACCGAGGTCTCCCCCTGGGTCTTCACGAAGCTGTCGTCGCGCAGCAC R Q L F N Y F H I L G G F M L R S L F R Y A F E V S T E G Q T K V F S D D R L V 4560 AGCcTcGAAGGTCcG~ACG~cCGTCGAAccC~AcACCATCTTTcGGAGGCGCGCGGTCACCGCGACCTGGCTGTTGAGGACGTACGTGATGTCGTTCCGGGCCAcGAcTAGCTGTTG A E F T R F T G D F G F V M K R L R A T V A V Q S N L V Y T I D N R A V V L Q Q 4680 cTTGCTGTGcACCTcAcAGcGcACGTGcccCGcG~c~GTCCTGACTC~GGAGTAGTTGGTGATGCGACTGGCGTTGGCCGTGATCCACTTTTCCA~GTCAGcG~GGTTGcTGcGT K S H V E C R V H G A D Q D Q S Q S Y N T I R S A N A T I W K E M T L T P Q Q T 4800 GAGCcGTCGATACTcGTc~AcTcTTTGACCGACACAAACGTGAGCACGGGGAGGGT~ACACAAC~ACTCCCCCTCGCGAGTCACCTTTAGGTAGGCGTGGAGCTTGGcCATGTACGC L R R Y E D F E K V S V F T L V P L T F V V F E G E R T V K L Y A H L K A M Y A 4920 GcTGACC~cTTG~GGAcGAGAACAGCCGCG~CACCCCGGAAGGT~GCCGGGTTGGTGATGTAACTTTCCGGGACGAC~AGCGGTCCAC~ACTGCATGTGC~CTCGGTGA~GG S V E K H S S F L R T W G P L N A P N T I Y S E P V V F R D V F Q M H E E T I P 5040 AAG~CGTAcTCCAGCAcCTTCATGAGGT~CCGAACTCGTGCTCCACAC~TCGCTTGTTGTTAATGAAAATGGCCCAGCTGTGCGAGAGGC~CGTGTACTCGCGTAGGGTGCGGTTGCA L G Y E L V K M L N G F E H E V C R K N N I F I A W S H S L R T Y E R L T R N C 5160 GATGAGGTAcGTGAGCACGTTTTCGCTCT~CGGACGGAGCATCGCAGTTTTTGGTGTTCGAAGG~GACTCCAGCGAGGCCGTCTGGGTCGGCGACCCCACGCACACCAGCACCGGCcG I L Y T L V N E S Q R V S C R L K Q H E F T S E L S A T Q T P S G V C V L V P R 5280 CAGGCGGCCCGCGTAC~GGGGGTG~GTACAGGGCGTTAATCA~CACCAGCAATACACCACGGTCGTGAGTAGGTGCCGCCCCAGGAGCCCGGCCTCGTCGATGACGATAATGTTGCT L R G A Y Q P T H Y L A N I M W W C Y V V T T L L H R G L L G A E D I V I I N S 5400 GCGGGTGAAAGCCGGCAGCGCCCCGTGTGTGACCGAGGCCAGGCGCGTGAGGGCACCCTGGCCCAGCCCCAAAGTCTGCTCTAGGG~GTGAGGGCGTGGAACTCGTTTCGCGCGTCTTC R T F A P L A G H T V S A L R T L A G Q G L G L T Q E L A = T L A H F E N R A D E 5520 GCcCCcGTGCGccGCCAGGGCcCGCTTGGTGATGTCGAGGATCACCTCCCAGTAGTACGTCAGGTcTCGCCGCTGCAGGTCT~CAGCGAGGCGGGGCTGC~GCCAGGGTGTACGGGTG G G H A A L A R K T I D L I V E W Y Y T L D R R Q L D E L S A P S S A L T Y P H 5640 CTGcccCAGcTGGGCC~GACGTGATTcCCGcG~ACcCGAAcTcGTGAAAGA~GTGTTGATGGGTCGACTCAG~ACGCcCCCGAGAGCTTAACGTACATGTTCTGCGCCGCGATTCG Q G L Q A Q V H N G R F G F E H F I T N I P R S L F A G S L K V Y M N Q A A I R 5760 BamHI CG~GC~CCGTGACCACGCAG~CAGGACCTCGTTGAGGGTCTGCACGCACGTACTCTTTCCGGA~C T A G T V V C D L V E N L T Q V C T S K G S G , - - 5829 Fig. 2. HSV-2 DNA sequence of the left end of UL. The rightward 5' to 3' strand is shown ~r the pa~ial sequence dete~ined ~r the BamHI c ff~ment. Proposed enc~ed amino acid sequences are indicated in the single-lettercode; rightward and le~wa~ translated amino acid sequences are shown above and below the DNA sequence, respectively.Gene names are at the le~ of the first line showing the amino acid sequence, rega~less of orientation. Prominent sets of sho~, tandemly reiterated sequences are marked as \ ..... /. Putative TATA boxes and polyadenylation-ass~iated sequences are underlined or overlined. In HSV-2 there are two A T G sequences (residues 198 to 200 and 258 to 260 in Fig. 2) upstream of the U L 1 0 R F which are not conserved in HSV-1. These are out-offrame with UL1 and were not assigned a coding function. HSV-1 gene UL1 m a y be a locus at which mutations can give rise to a syncytial plaque phenotype (Little & Schaffer, 1981; Perry & McGeoch, 1988). A possible N-terminal signal sequence for translation on m e m b r a n e - b o u n d ribosomes seen in HSV-1 UL1 is conserved in HSV-2. In gene UL3 a possible near N-terminal signal sequence is present in both HSV-1 and HSV-2. G e n e U L 2 encodes the D N A repair enzyme uracilD N A glycosylase in HSV-1 (Mullaney et al., 1989) and HSV-2 (Worrad & Caradonna, 1988). A c D N A clone corresponding to UL2 of HSV-2 strain 333 was sequenced by Worrad & Caradonna (1988), and it was found that most of the sequence was clearly similar to that for the HSV-1 UL2 region, but that the HSV-2 sequence was dissimilar in its first 390 nucleotides, across the proposed start of the O R F encoding uracilD N A glycosylase. However, we have found that the similarity between the HSV-1 sequence and our HSV-2 sequence does continue across the 5' regions of the genes, and that residues 1 to 390 in the sequence of Worrad & C a r a d o n n a originate from another HSV-2 gene, US2 (McGeoch et al., 1987); their c D N A clone was thus probably formed artefactually in vitro. Previously, we put forward two possible candidates as A T G translation initiation codons for HSV-1 UL2, the upstream of which lay within the UL1 coding region (Perry & McGeoch, 1988). In our HSV-2 sequence, the first possible A T G initiation codon of HSV-1 is conserved whereas the second possible A T G of HSV-1 is not conserved. The HSV-2 sequence possesses an additional A T G , lying between the locations of the two HSV-1 A T G sequences, in the correct reading frame and not overlapping the U L 1 0 R F . We consider that this locus must be regarded as the primary candidate for the translation initiation site of HSV-2 UL2, as shown in Fig. 2. The HSV-1 sequence aligned with this site is A C G (residues 10123 to 10125 as numbered by M c G e o c h et al., 1988). A C G is known as an alternative initiator of translation (see for instance Curran & Kolakofsky, 1988; G u p t a & Patwardhan, 1988) so it is possible that this codon is a translation initiator for HSV-1 UL2. Aspects of sequenced genes adjacent to the right end of HSV-2 UL The HSV-2 sequence determined at the right end of UL includes the whole of the 545 bp non-coding region Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 3062 D. J. McGeoch and others old3 12o BamHI 480 GGGCCTACGCACCCTCGCACGTCGCATGCAAATTAAAATCGTGCACAGAGCCGATCCGGCCTCGGGTCTGCTTGCCCCTCCCC CAC CCCAC CCTACCGCGTGC CGGCCCAGCACAGGCAGGCTCGTCCGACTTCCGCATA 600 TTCCGCACCCCCGCCTACGCGTGTACGCGAAGGCGGACCCAGACCTGCCGTATGCTAATTAAATACATAAAACCCACCCTCGGTGTCCGATTGGTTTCTG GGGACGGCGGGGGCGGGGGCGGTGACGCC CGACGGGGAGGGACAAGGAGGAGTTTCGGAAAGC 720 CGGCCCCGGTCGTGCGGGTATAAGGGCAGCCACCGGCCCACTGGGCGCTGTGTGCTG 840 UL54 CCGTGTGCCGACCCCGGTTGCGCGTCGGTGCCGCTCCTCGATTCGGACCCGGC M 1 960 P R E P H G CCCCCGGGAACCGCACGGG 121 1320 CACTCTCTTCCGACACGCGCCCCCTCGGAGGACACCCGCCATCCCAGCCCCGGCGACCTACAACATG E A S T P R P A A R R G A D D P P P A T T G V W S R L G T R GAAGCCTCGACGCCTCGCCCGGCAGCGCGGCGGGGAGCCGACGATCCGCCACCCGCGACCACCGGCGTGTGGTCGCGCCTCGGGACCAGGCGGTCGGCTTC R S A S A D T I D P A V R A V L R S I S E R A A V E R I S E S F G R S A L V M Q G CCGACACCATCGACCCCGCCGTTCGGGCGGTTCTGCGATCCATATCCGAGCGCGCGGCGGTCGAGCGCATCAGCGAAAGCTTTGGACGCAGTGCCCTGGTCATGCAAGACCCCTTTGGC D P F G 281 1800 L K A R G L C G L D D L C S R R R L S D I K D I A S F V L V I L A R L A C TGAAGGCCCGAGGCCTGTGCGGGCTGGACGACCTGTGCTCGC•GCGACGCCTGTCGGACATTAAGGATATTGcCTCCTTTGTGTTGGTCATCCTGGCcCGCCTCGCCAACCGcGTCGAG N R V E 441 2280 R G V S E I D Y T T V G V G A G E T M H F Y I P G A C M A G L I E I L D CGCGGCGTGTCGGAGATCGACTACACGACCGTGGGGGTTGGGGCCGGCGAGACGATGCACTTTTACATCCCGGGGGCCTGCATGGCGGGTCTCATTGAAATACTGGACACGcACCGCCAG T H R Q 481 2400 E C S S R V C E L T A S H T I A P L Y V H G K Y F Y C N S L F GAGTGTTCCAGTCGCGTGTGCGAGCTGACGGCCAGTCACACTATCGCCCCCTTATATGTGCACGGCAAATACTTCTACTGCAACTCCCTATTTTAGGCAAGAATAAAcATATTGACGTCA 512 2520 ACCCAAGTGGTTCCGTGTGATGTTCTTGGCGCGCGCGGCGGGTGGGGCGGAGACTCCGGGGCGATGCCGGCGTGCGCGTGGGAGGAGGGCGATGACCCACCGGATAAATGTGGGGCCCCG 2640 2 6o T S G P I H C F F F A V Y K D S Q H S L P L V T E L R N F A D L V N H GACCAGCGGCCCCATCCACTGTTTTTTCTTTGCGGTGTACAAGGACTCGCAGCACTCCCTTCCGCTGGTTACCGAGCTCCGCAACTTCGCGGACCTGGTCAACCACCCGCCCGTCTTGCG E L E D K R G G R CGAACTAGAGGATAAGCGTGGGGGGCGGC R 97 3000 L R C T G P F S C G T I K D V S G A S P A G E Y T I N G I V Y T G C G G T G C A C G G G C C C A T T C A G C T G C G G A A C C A T C A A G G A C G TC TC C G G T G C A T C C C C C G C G G G G G A A T A C A C G ATAAACGGTATCGTGTA 137 3120 H C H C R Y P F S K T C W L G A S A A L Q H L R S I S S S G T A C CACTGTCACTG TCGG TATC CGTTCTCCAAAAC C TGC TGGCTCGGGGCATC CGCGGCC CTACAAC ACC TTCGCTCTATAAGCTCAAGCGGCACGGCCGC H K I K I K I K V CCACA/L%ATCAAAATCAAAATCAAGGTATAACC TAGGAACCCGGTAAATAC CACGCGACGAACCAGCATG TGTGTTAACGCAACTTTTATTCGTTGTATCGCGGGAGGGGGGAAGCTTAC ...... AC CGCGACCACCCCAAAAACCGCATGACGACACGTCC V A V V G F V A H R C T G C G C C A C AC C A C C C T G G G G C T T G G G G C G T G T C G G A G C G C W G P A Q P T D S TTGACAAGCGGGGGTCGCCACGTGCGCGAGCTTTGCACGCGGGGTTGGTCGGC N V L P P R W T R S S Q V R P Q G G A R V L TGTGAGTTTGTGGGTTA CGCCAAAGGAAGGC CAAGATGATAACGACGAC C R W L F A L I I V V V TCGACGCACAGCGGGC CGCGCGTTGGGCC S A C R A A R Q A 177 3240 186 3360 3480 225 CG G T A C A G C T C T C G C G A A R Y L E R S 3600 185 CGGCC C CACGGACC CGCC CGG TGGC TCGGTCG GACATGCGGCCATGACCATGGCGTAGGTGGGGGGG P G V S G G P P E T P C A A M V M A Y T P P 3720 145 S C C G A C G G G A G G T C G C C T C C C A C GC C A G G G T G G G C C C C A A T C A T A G T T T C C G G T A G A A A C A G G G G G G T C S P L D G G ~ G P H A G I M T E P L F L P T GGGCCAAAGC TCCGGCGC CGCGC CCGTCGTTCGGCGCGGCGCC P G F S R R R A R R E A R R GGTGAAACAAGCCCAACCGGCGACGTCC P S V L G V P S T D P R A A E Q R R TCGCGCG GCAGAACAGCGACG CACCC C CTTCC CTC CGAGTC CGTATGCAAC C TCATTAATAAAGAGTGAGAACCAACCAAAACAGACGCGG UL56 CGATCCGAGGTCGC CTCTGCGTAAGTAGGGAGGC R D S T A E A Y T P L A P TCCACAAACAACCCCCC T E V F L G G 3840 105 TGGCGCGCCG AGCGGC C CGC CAGGCGGCGCG GCGCGAG CGGCCACGC TCACACACC TCGC CGTCACCGGAAGAAGC C A R R A A R W A A R R S R G R E C V E G D G S S A 3960 65 CTGC AGAGTACGGTGGAGGCGAGTC CG TGGGGGTGTCGATATCAATAACGACAAACTG A S Y P P P S D T P T D I D I V V F GGCGGGGCGTC~ATCACGCTATCATC TCCGTCATCCCTGCATGCGTGGGCATGCC P P A D I V S D D G D O R C A H A H G Q CAGCC C C CAACGC CATGGTGGGGATTCGCGGC A G L A M Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 GCC CGCGCTCGCGC CGGC CACACTC TCGTATGGG G A S A G A V S E Y P TC A G A A G C C T G C A T G T C G T G TGG TCGGTCGTAG 4080 25 4200 1 Sequence ofthe HSV-2 ~ng repeat 3063 TCCAACGTGCCTCCCCCACCCACCACACAGCCGGTCCCCACGCCGACCACTAGACCGCAGACGTCGCCCAACCGAGGTCCCCGTGCACAGACCGCGCCTTTTATAGCCCCAGGGGTTGCT 4320 End o f UL ---><--- Start of IRL AATTAACGCACGCATGCAGACGCAATTTATTTTGCTCCCCCGCGTCCTCCCCTCCCCTGCGCACACGTGATAGGTCTTGGGAACCCGAGGGGCGACGCGGGGAAAGCGCGCCCCCGCcCG 4440 ....... \ ............... /\ .... \ .............. Family 1 Family 2 GCCGCcGCGcGCcCCCGCCCGGCCGCCGCGCGCCCCCGCCCGGCCGCCGCGCGCCCCCGCCCGGCCGCCGCGCGCCCCcGCCCGGCCGCCGCGCGCCCCCGCCCGGCCGCCGCGCGCCCC ..... I\ ................... /\ ................... I\ ................... I\ ................... I\ ................... I\ 4560 ........ CGCCCGGCCGCCCGCGT~GCGCCGG~GCCCC~CCGGCGCTTcCGGG~TCTTTCCTTCCTTCCCCGCCGCGA~CCCGACC~CGCC~CACCG~CCCGCCCGGCAGGGGGGCCCCGG~GCC 4680 ........... / RL2 GCGCAGAACACAcAGACGAACACACGGTGGCGATCTTTTCTTTACTTCGGCGGACCAGCGAGCCCCGGCCCCGGCCCGCGCCCCGCCGCCACACCCACGGCACCCCCCCCCGCCGCCCAC 4800 CCCGGGGTCCACACAGGAGCGCGCGGGCGGCAGAAACGCGGGCGCGGCGGCGGTCGGGGTGGGAGTGGTGGTGGGGGACACGAAAACACACCCACGACACTCTCCCCCCACCCCGACCGC 4920 CGCCGCGCCCCACCGGCGGGATCGCGGCGAGACGCAGCCGGGCCCCCCCCCACCACCCGCCCACCCACCTACCCCGCGCCCGCAGCCTCCGGCAGCACGCCGACCACCGCCGCCACCCCC 5040 CAAACAGCCAAGGCGCGGTGGGGGGCGTGGTGGTGAACGATGGGGGGAACACGGGGGGGAGGGGTCCGGGGCGAGGCGGGCGGGCGAAGGAAGGGGGGGTGGTGGCGGC•GCGGTGGAAA 5160 GCGGAAAAACGGAGGATGGAAGGGcAGAAGATGGGGAGTCCCGATCCTCCTCCTGCATCCCCTCGCCTTCCATTCTCCGGCCCTCCGCGAGTCCCGACGCCCCCCCCCCGcCGCCCGACG 5280 AAGGAGACCCAAGCAcCGCAGCCGGAGAGGCCGAGCGGGGAGTGGGCGGCCGGGCGGGAGGATGGCGGAGAGAGAGAGAGAGAGAGAGAGAGGGGGGGGGGGGGAGAGGGAAAGCAAcGG 5400 GAAAGAGAGGCGCGcGGAAAAGCAGCAAGAGGGGGGAcGGGGCGAGCCGGGCAGAGTGCGGAGCCCCCGGAGCCCGCGGcCGCAGCCGAGCAGCGCCGcGGGCTCCGGGGCCGGGCCGGG 5520 CCGGCAACGCCCCGCGCCGGCCGCGGCGGAGAGAACCCCTGTGTCATTGTTTACGTGGCCGCGGGCCAGCAGACGGGCCGCGGGCCAGCAGACGGGCCGCGGCGCCAGCGGCCCACGCCT 5640 CCCGCCGCATTAGGCCCCCGCGGGCATCCGGCGGCCGGCCCCACGCCCTTCCATTAAACACTCCCACGTTGGGGGGGGGCGCGCCAGCTGAGTGCTCTGCGGT~CGGGCGC~GTGCCCG 5760 GAGATCcATTAAGCCGCcGGAGAGCCCGAGCCCcGCCCGCGTGTTGCTGTGGGCATTTCTGCTGCGTCATCCCTGTCTTTATAAAACCGGGGGCGCGGCAGcAACGAACGcAGGGGCcCG 5880 CCGCCGATCGAGAGGGACTCCGGAGAAGGAAGGCTGCTCCGCGCACCGGCGCGCCCTTCTCCTCTCCCCTCCCTACCTCCCCCTCTCTTCCCCCTTTTTTCCCCCGCCTCCCGTCTTCTT 6000 CCGCGCcTCCGAGGGTCCGCCTCTTGCCTCGGGGACCCCCGGGCGGGCCGGGGCTTGGCCGCCGAGGTGCGCCCCGGCCGGAGGGGCCCCCGCACCTCGGCGGCCGCCCCCTCCGG•GCC 6120 GCGCGTTCGCGAAAGGCGCGAAAGGGGCCCCCGGAGGCTTTTTTCGATTCCCGGCCGGGGGTCCCGGGTAGCCGCCCGGCGCCGGGCGGAAGGCGTCCCCCGCCCGGCGGTCCGGCCCGG 6240 GCCCCCGGCGGAGCGCGGGGGCCCCGGGGCCCCGGGCCGCGCCGGCGGCGTTTCCGCGTTCCGTTTCTTCTCCCTCCCGGGCCGCCCCGCTCCCGGGCCCGACCCTCGCCCCTTCCCTTC 6360 TCCTCGTCTTCCCCCGTCCCGCCGCGCCCCTTCCCTCTTCCTTCTCTCTCTCTGTCTCGCTCTCCTCACATTTCCCCCCCCCCCCCCCGCCGCCGCCGCCCTTTGCCCGCGTCCCACCGA 6480 Proposed LAT splice donor site / GAcGCCGCGCCGcGTGAGCCGTCCGCCGGGGGACCCAGGCTCCGGGGGGGGGGGGCGCCTGCGTGTGTCTCGTGTGAGAGAGCGCGCCCCTCGAACGCCGCGCGTTCTCGCAGGTAGGTT 6600 TAGGGTCGTACAGGTGAGCTTCTGCTGAGGCGGCGGGGAGAGGGGGGGGGGGCGGGCGGAAGAGAGAAGAGAGCAGGGGTTGGGGGAGAACTGTTCTTCCTCcCCcTTTCAAGAAACACG 6720 AGGCGGGGGTcCCAGAAAGGGCAGGCAGGTCAGCCGCACCGCCCGCGAGCCAACCCGTATCCTTTTTTTCTAGGTGTTTTTGTTTTTGTTTCTGTTTTTGTTTGTTTTGTTATTATTTTC 6840 GCGGATCCGGCGTGTTCGGATCCACCCCCCCTTTCTCCTTCCTCTTCCCTTCCACCCACCCCCGTTTCCCCCCCCCCCGTCGTCGTTCCCGGGGGGGCAGGCGCGGGTCGGGCCCGTACG BamHI BamHI 6960 CCCAcCGCCCCCACGcGCCGGTCACCCCCCCCCAACAACCCCAAAGGCGCGTGCCCGGCCACAGCCGTGGGTGTGGCGCCCGTCCCCTTCCTCTACCGcGTGGGCGCGGGCGGGGGGGTG 7080 G~GTAGTGGTGGCGGAAGGAAACGGGCCGGGGGCCGGGGCCGCTAGGGAAAGGTAGGCACGCGCGCGGTGTGTCGACTTGCATGCCCCGCAAAACGCGTCGTGTCGTGTTGTGTCGTGG 7200 TGGGCCGTGTTGTGGTGGGCCGTGTGGTGTGGTGTGGTGTTGCGAACGCGCGAGCCCCCTCGCCCCGATGGGAGTCTCCCCGCAGCCAGGGTAAGGAGGGGCGGGCGTGGCGGGCAGGTG 7320 TGCGGGCGGGGTGGGGTGAGTGCGGTTGCATGCCTCGGGTCTCCTCTTCCTGCTCCTCCTCCTTTCTCCCAGCCAGGGTGAGGAGGGGCGGGCGTGGCGGGCAGGTGTGCGGGCGGGGTG 7440 GGCGCCGGGGCGGGG~TGGGCACGGGCGTAAGTGCGGGTGCATGCCTCGGGTCTTCTCTTCTCCCTCCTCCTTCCTCCCACCCGTCCCCGGGGGCAGAGGGCGTGCATGCGT~TGATTC 7560 AACCGCCCTCGCCCCCGCCCCACTTTCCCCCCTCTCTATCAAAGTTCCCTGGCCCCTGGCTTCGCGCCGGTGGTGCGGCTGACCCCCCCCCTCCTCCCTCCCCGAGCCAGGCGCCCTCCC 7680 ACTCCTGCCCACCACCCCCAGGGTCTGGCCGGCCAGACGTGCGTGCTCTGCACGATCGGGCCCCCCTCCCTGTCAACACGGACACACTCTTTTTTTACCCGCCAGCCAGCCCGCCCACCC 7800 ACCAAGACAGGGAGCCAGAACGAGGCCGGGCCCCGGCTCTGTTCTATGATAAAGACCAACAGGCCTCGGGGGTGGGGGCGGCTTCTCGTGCCCGCCCCCCCTCCTCCTCCTCCCTTCCCC 7920 •CCATCCCCGGCCCCCCTGCGCGGGGGAGCTGCATCAAAGGCCAACAACAAAGTGTGTCAAAAGCATCACAAAACTTTATTGTAAAATTTTTATAAATATAAAGTTTTTTTTTTCCTCAA 8040 GTTTTCAACAAGGCCAGAAAGTCCATAACAAAATGCTGGTGTGTGTTGCTGTTCGGGGCCGTGTCCGTCCCCCCCCCCCACTCCCACC•CCACTTCCTGTCTCCTCCCCGTCTTTCCCCC 8160 CCCCCACCTC~CCCTGCCCCCGAGGCGCCTCGGCCGGTGGTCCGGTGGGGGGCGGCTTCCTTCGGGCAGCAAGCCGAGTGTTAGCTCCCCCTACTCCCCGTG~CCGCGGGGGCGTCGCC E G H G A P A D G 8280 817 GGCCGGCGCGGGCGCGCCCTGCTCCCGAGACCACGGGTGGCGCGACCGGAGGCCGTGGAAGTCCAGCGCGCCCACCAGGGTGCCCTGGTCAAAGAGCATGT~CCCACCGGGGTCA~CA A P A P A G Q E R S W P H R S R L G H F D L A G V L T G Q D F L M N G V P T M W 8400 777 GAGGCTGTTCCACTCCGACGCGGGGGGCGTCGGGTAGTCGGGGGGCCTCACGCAGTTGCGCGCGTGCTCGGGGAGCAGGGTGCGGCGGCTCCACGCGGGGGCCGCGGCCCGCAGCAGGTC L S N W E S A P P T P Y D P P R V C N R A H E P L L T R R S W A P A A A R L L D 8520 737 CGCCACGT~CCCGTCTGGTCCACGAGGACCACGTAGGCCCCTATGTGGCCCGTCTCCATGTCCAGGACGGGCAGGCAGTCCCCCGTGACCGTCTTGT~ACGTAAGGCGCCAGGGCCAC A V N G T Q D V L V V Y A G I H G T E M D L V P L C D G T V T K N V Y P A L A V 8640 697 GACGCTcGAGACCCCCGCGATGGGCAGGTAGCGCGTGAGGCCGGGCGCCGGGTCGCGGGCCCCGGGCTCGGGGCCGCCCTCCGCGTGGCGCGTCTTCCTGGCACACT~CTCGGCCCCCG V S S V G A I P L Y R T L G P A P D R A G P E P G G E A H R T K R A C K R P G R 8760 657 Family 3 Proposed LAT splice acceptor site \ ......... \/ ............. \/ ..... CGGCGCAGCAGcGCGGGGGCCGAGGGAGGTTTCTCGTCTCTCCCCAGCGCCGGACGCGGACGCGACGCTCCCACCAGCCCCGCCCGCAGAGGAAGAGGCGGAGGAGGAGGAGGCGGAGGA P A A A R P G L S T E R R E G A G S A S A V S G G A G G A S S S A S S S S A S S 8880 617 ........ \/ ............. \/ ............. \/ ............. \/ ............. \/ ............. \ GGAGGAGGCGGAGGAGGAGGAGGCGGAGGAGGAGGAGGcGGAGGAGGAGGAGGCGGAGGAGGAGGAGGCGGAGGAGGAGGAGGcGGCGGCGACCGCGGCC~GGACGACGGAGACGCCGA S S A S S S S A S S S S A S S S S A S S S S A S S S S A A A V A A Q S S P S A S 9000 577 CGGGGGCGCGGCGCcCGCGGACGCCGGGGcGAGCG~CCGTGGCc~CGGTCGCCCGAG~CGAG~CGGGGCCCGGCGCGGCGCCGCCCTCTTGGCCCCcACCC~CTGGGGGGCGAGGGG P P A A G A S A P A L P G H G R D G S D S D P A R R P A A R K A G V G Q P A L P 9120 537 CGAGCGCGGGGCGGCGGAGGAAGAGGCGGAGGACGAGGCCGCGGGGCCCGAGTCCGACCCGCGCCTCTTCCGGGGGCGGGCCGCCGCCCCCTCCGCGGCGTGGGGGGCGGCACCGGGGGT S R P A A S S S A S S S A A P G S D S G R R R R P R A A A G E A A H P A A G P T 9240 497 GTTGGTGCCGCGGGGGACCCCGGG~CTCCC~CGC~CCGGCCC~CCGACCCGCGCGCGTCGGTCGCGCCTGCCCGGCCCAGACTCTGTGCTTGGGTGTCGGTCTGAGCCTGGGTCAT N T G R F V G P G G E A G P G G S G R A D T A G A R G L S Q A Q T D T Q A Q T M 9360 457 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 D.J. McGeochand others 3064 GCGCGACCGGGGCGCGCGGTGCGCGTCCACCGGCACGGcGGGCGGCGCGGGCCCGGCCGCGTCCGCGCTCGCAGACACCACGGGGGCGGCGGCGGCGCGGGGcGGACTCCGGACGCGCGG R S R P A R H A D V P V A P P A P G A A D A S A S V V P A A A A R P P S R V R P 9480 417 GGCGACGGCCGCGCGGGGGCGCGCGGCGCGCCCCGACGACTGTGGCAGACCTCCCCCCCCGGGGCCCGAGGACACCTGTGCGGAGGAGGAGGAGACAAAGGAGAGCGGCCCGGGGCCCGC A V A A R P R A A R G S S Q P L G G G G P G S S V Q A S S S S V F S L P G P 9600 377 GGGGCGGCGCGGAGACGGCGGGGGAGAGTCGCTGATGACTATGGGGGGCTCCTGGGCCGCGCGGGGCTGTCTCGCGGGGGGCGTCC F R R F S P P P S D S I V I P P E Q A A R P Q R A P P T R G E G A TGCCCTCCGCCGCCGCGGCGTCTTCGCCCACCCG A A A A D E G V R cCGCGCCTGCGCGCGCCCCCCGCCGGcCGCAGGGGGAAGAGAGGCCACTCTCGGCACGACGGCCGCGACGGCAGGGCCGCCCCCAGACCCAGATCCCACCCCCGCCCGCAACGGGGCGCC R A Q A R G G G A A P P L S A V R P V V A A V A P G G G SG S G V G A R L P GCCGCTGCTGCTGCTCCGCGGGGCGCCAGGGGGCGCCGGTCGGGTCGCGGCGGGCTGGGAGGTTCCGCGGGTCGCCCcCGCACCGCCGCcCCCGCGcCGGGGCGCTCTTCGGGGGGCGGG GS S S S R P A G P P A P R T A A P Q S T G R T A G A G G G G R R P A R R P A P A 9720 337 9840 297 G 9960 257 S t a r t of e x o n 3 \/ E n d of i n t r o n 2 CGG••CGTAGTCCACTGCAGAGGGAGACAGA•AC•GGAGCCCCCGGTTAGT•CCC•ACCCCCGCCCGACCCCC•CCC•ACCCCCGCCCGACCCCCGCCCGACCCCCGCCCGACCCCCGCC P V Y D V \ . . . . . . . . . ]\ . . . . . . . . . /\ . . . . . . . . . ]\ . . . . . . . . . /\ . . . . . . . . . /\ . . . . . . . . . /\.. Family 4 S t a r t of i n t r o n 2 \/ E n d of e x o n 2 CGACCCCCGCCCGACCC••GCCCGACCCcCGCCC•A•CCCCGCCC•CCCCCC•CCCGACCCCCGCCCGCCCTCACCGTC•GcCAGGTCATCGTCCTCGTC•TCC•TGCC•GGCCACGGGG . . . . . . . /\ . . . . . . . . . /\ . . . . . . . . . /\ . . . . . . . . . /\ . . . . . . . . /\ . . . . . . . . . /\ . . . . . . . D A L D D D E D D T G GGG TGGGCGACAGGGCGCGGAC C G T G TG T C C C C C C A G C G A C A G G G A G C G C G G G G C C G T C C G C G G G T T G C C C G TC C A G A T A A A G T C C A C G GC C G T G C C G G C C C G C A C G G C T F S L A R V T H G G L SL S R P A T R P N G T W I F D V A T G A R V A A E CCACGCGGGTCCGGGGGTCGTTCACTATCGGGATGGTGCTGAACGACCCGCTGGCGGTCACGCCC V R T R P D N V I P I T S FS S S A T V G V TCCAGGTCTTCATGC W T K M C ACGG GATGCAGAAGGGGTGCAGGC P I C F P H L AGGGAAAACTCTGGC C P F S QC P W P G BamHI CTTCCTCCTCACCCACGGGCCCACCCCCACAGGATCCCTGCGCGTCGGCGGGCGTGGGGCTGCCCTGGCGCTCGGCCGGGGGCCGGGCCGGGGGCGTGGCCGCGTCCATCAGGCCCGCCT E E E G V P G G G C S G Q A D A P T P S G Q R E A P P R A P P T A A D M L G CGAACATCTCCG TGTCCGTGCTGCCCGCCTCGGAGGTGGAGTCGCGGTGAAGGTCGTCGTCAGAGATTCCCAC F M E T D T S G A E S T S D R H L D D D S I G V E P I TGC TTTTTGTTCGGAAGGGGGGGAGAAAGGGGTC CG T A A C C A A A G G T G G T C T G C G TCCTCC CTGCC TCCCTCGCCCCCCCAGAGGGTCGGGGGGCGGCGCACGGCCCACGGGGGTCCCCCGACCGCT 10560 i17 A E 10680 77 TGCATGTCGT D N 10800 37 CTC GGTCTCCTCCTCCGAGTCGCTGCTGGCGAGCCAC T E E E S D S S A L W Q ACCGCCCGCGAC CACCCC CAAC CCGCAGC CGGGTG GTC CGGG GAAAAGGGGGG M TAAGCGGGCCGGGGGTCGGCCC 10920 26 CCCCCCTGTCCCCCGCTCTCG GGC CGTCAAGCGTCCCCGCCCCCGAGCCC S t a r t of i n t r o n 1 \/ E n d of e x o n 1 GC C T G A G A C C C G G G G G T C G C C C TC TC AC C G T G C C G G G G G T C T G C C G C G G C G G C C G C T G P T Q R P P R TCGGGGCCGGG E P G P 11040 11160 11280 13 GTCCGCCCGGGAGCTCGTGCCGGGCCGGGGTTCCATGAGCCGGGGTAGGGTAGACTCGAGACGGCGGCCCGCGGTCTCTCTCTTGCCGGGTTTTAGTCTCTGTCTCTCCGGGTCTCCTCC D A R S S T G P R P E M 11400 1 TCCCGCCGGGCCGCCGCTCCGTCGCTCGCAGTGCCGGGGTGCGAATGCGGCCCGACCGTCACACGGGGCTGCCTTATACCCGGCGCCTATCCACTCCCCCAAAGGGGCGGCATTTACGAT 11520 TCCCCCAATAGCCGCGCGCCC CGGCGGGGGCGGAGGGAGGGAATCCCCCCCTCTCGGGGCGGCCCCGTCCCCGGGGACCAACCGGGTGTACTCCAAGAACCCCATTAGCATGCGCCGCCC C CCGC CGACGCAGATGGGAGTCC CC C C G G C G C C C C G C C G G C G C G G C C C T G A G T G G T G C AGCCCACCCACCCGGCGGCGCGCGAG TTAC CATAAGCGGGAATGGCGGC RLI TC C T T T G G A T T C C G A C C C C T C G T C T C 10440 157 CTCCCGCTTCCG G A E A S t a r t of e x o n 2 \/ E n d of i n t r o n 1 TGAGCATCCCCCAGGCGTGCGGGGCGGCGGGCTGCTTGACAAAGCAACGGGGGGGATTTAGAGGGCGCGGGGCGTGAGGCGGGACCCCCGCGCCGTGTCCCCCGTGTCCCTCCCTCACCC L M G W A H P A A P Q CGGCCCCCCGCCCGC 10200 237 CGC CTCGGCC T 10320 A E 197 ACTATCAGGTACGCCACCGGGGTGTTGCACAGGGGACACGTGTTGCGCAACGGAA I L Y A V F T N C L P C T N R L A G C G C A G G G G C G G G GC G A T C T C G T C C G T G C A C A C G G C A C A C A C G T C G C C C C C C C R L P P A I E D T C V A C V D G G P 10080 252 CCGCCCCCGGGGAAAAATTCATTAGCATAC G C T C T G C G T G T TC T G C C A A G A A A G T A A T C A G C A T A A C C C CGTTAAAAGC TGCTAAT TAC C GCGAGCGGGAACGC 11640 TAG GAAGC CCAG GGGAC CAATAGGGGC CGATC CGGAACCC CGAGGGAGTAATTACGCGGGGAGCGAGGGGCCGTC CGGC CCATTAAAAGTTGCTAATTACCATGCGCGGGGATGGCGGC CGAACGTTTTTAA CGGGACCGCCTATTAAA 11760 11880 12000 AGTTTCTAATTACCATAcCGGGAAGCCGGCGcGGGGCGGTcGCcGGGGCGGAGTCCGGGCccGcGCGGCGGCGCGCGGTTGGCCGGCGcCGCCCCCTGGGGcGGGCGGAGcGG•GGGGcG 12120 GCGCCGGGCCcTCGcGGATATATACGCGGGGCTCCCATcGTcTCTTCGGAGAGCGGCCTCGCGCAGAcCTTCGGAGcTCCGGGGCTCCGCCGGCCGAGGcCGCCCTCGCCGGTTCAAcCC 12240 TAGACCGCCcGAcGGCCcGGGCCCGCGGCGGCGGAGGAcCCGcGCGCCGCCGCCGCCGCCTCCTCCTCCTCCGCGGGTcCGCCGTCTTCGTGGGCCCGGGCTCGGGcTCGGGccCGAGCT V A R R G P G A A A S S G R A A A A A E E E E A P G G D E H A R A R A R CGGGCCTCGGGCTCCAGGCACGGTCCGATGACCGCCTcGGCCGCcGcCACGCGGCGCCGGAAccGGTCGCGGTCGGCcCGCTCGCGCGCCCAGGACCCCCGTCGGGCCAGGCGCGCGGCC R A E P E L C P G I V A E A A A V R R R F R D R D A R E R A W S G R R A L R GTC ~CC CAGGCCACCAGATGGCGC T E W A V L H R CGTCAGGGGGTCGGAGGG \ ................. GGGCCGCCGC A R A A 12360 223 12480 183 A S t a r t of e x o n 2 \/ E n d of i n t r o n AC C T G C A C G C G C G G C G A G A A G C A C A C C T G C G G G C G G G G A G A C A C G G G G G TCGGAGGGGCGTCAGGGGG TCGGAGGGGCG TCAGGGGGTCGGAGGGG 12600 V Q V R P S F C V . . . . . . . . . . . . . /\ . . . . . . . . . . . . . . . . . /\ . . . . . . . . . . . . . . . . . /166 Family 5 S t a r t of i n t r o n \/ E n d of e x o n 1 GCGTCAGGGG GTCGGAGGGGCGTCAGGGGGTCG G A G G G G C G T C A G G G G G T C G G A G G G G A G G C G T A C C T T C C C G C G C G G C G C G TC C G C G G G C G G G G A C G C G G G /\ . . . . . . . . . . . . . . . . . /\ . . . . . . . . . . . . . . . . . /\ . . . . . . . . . . . . . . . . . / K G R P A D A P P S A P CGGCGCAGGC TCAGGCGCGC CAGG TAC TCCGTCGTGGTGCGCAGC CGTAGCGCCAG GTGGGGCGGAAGGGG GCGCTGCGGCCCGCGCTC CTTGCGCGGCGGCGGCGGGGG P R R R R L S L R A L Y E T T T R L R L A L H P F L P R Q P G R E K R P P P P P GCAGGCGGCGGCAGGCGCGGCGTGCGGGGC C A A A P A A H P A C T C C G G C G C C T T C C C C C C G C C C T C G C T C G G G G G G C T G T T C G C C C A C TC T G C G T C G T C G T T G C C G G C G T A G T C CGCGTCGTCGCTG E P A K G G G E S P P S N A W E A D D N G A Y D A D D S CGC CTGGGGCAC CAGCAGC C AGCGC CGCAGGAGCGAG A Q P V L L W R R L L S G A C G C G GC C G G C G C G C T C T C G A C S A A P A S E V A C GCGGT TCC CGAGTCGTACGCAGGGAC T G S D Y A P V M Q S TCGTC D D CATTTGGGAGTCTGCGGTTGGGAGCGCGCCGGG D A T P L A G P 12720 154 12840 114 12960 74 13080 34 ~ii~ ; GCGCGGCACGGCTGGAGCGCCGGGGCGCGGCACGGCTGGAGCGCCGGGGCGCGGCCGGCGCCGGGGACCCCGGCGGCGGGGACCcCGGCGGCGGGACATGGCGGGCGGCTGGGCTCGGCG R P V A P A G P R P V A P A G P R P R R R P G R R R P G R R R S M ..... I\ ...................... I\ ...................... I ........ Family TAGGC CCGGAGCCGGAGCGCG TCGGGGCGGGAGAGTTCACTCGGCACGCATGC I\ ............. /\ ............. 13200 1 I 7 A C G T G T A A C C G C C A G T C C G T G C T T G C C T A G C G A A C T C A C C C G TC C C G G C T G G C G T G C G C A G C C C G G G Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 13320 Sequence of the HSV-2 long repeat 3065 <--S t a r t of "a" s e q u e n c e CCGTGTTGCGGGCCCTCTTAAGGGGCGGCGGCAGGACGGGGACTCCCGCCCCGCCTCTTTTCC~CCGGGGAGTCAACCCCCGGGGGGGGTGTTTTTTGGGGGGGGGCGCGAAGGCGGGCG 13440 GCGGCGGCGGGCGGGCGGCAGGGCAGCCCCGCGCGcCCCCTTCCCCGTCCCTCCCCCGGAGCCGGCCGCTCCCCCGCGGGCGCCGCCCCTCCCCCCGCGCGCCGCGGGGCTGCCTTCCCG 13560 End of "a" sequence ---> ~GG~c~ccccGcGcGGcTTTTTTccc~cGcc~cGc~cA~GAcG~G~AcTAGcAG~TGTGccGcA~AccAccAcAcAcTcccAAGcTc~ccG~c~AA~A~AGT 13680 Fig. 3. HSV-2 D N A sequence of the right end of U L and the whole of IR u. The sequence is shown for the rightward 5' to 3' strand, from the BamHI site at the left end of B a m H I f t o the right end of the internal copy of the a sequence. Conventions are as for Fig. 2. Putative splice donor and acceptor sequences for LAT, RL2 and RL1 are labelled. Table 1. Location of coding regions and transcripts of Table 2. Properties of HSV-2 encoded proteins HS V-2 genes No. of codons Protein M~* Identity to HSV-1 protein ( ~ ) t ULI 224 2519I 65-2 UL2 UL3 255 233 28478 25647 85.1 74.7 UL4 UL5:~ 201 (783) 21805 75.9 (90.4) UL53~: (136) UL54 512 54955 79-3 UL55 UL56 186 235 20440 24713 86.4 62.8 RL2 825 81981 61-5 RL1 261 27906 62.7 Translationt Gene* Start Stop UL1 301 972 UL2 1085 1 849 1907 3411 (5829) (2) 958 2711 4151 9974 10834 11316 12530 13179 2605 2809 3481 406 2493 3268 3447 8254 10156 11242 12243 12685 UL3 UL4 UL5II Ut53tl UL54 UL55 UL56 RL2 RL1 Transcript:~ start ~200? - C C C C C C Exon Exon Exon Exon Exon 3 2 1 2 1 830 ~2640? ~4300? ~11450 ~13320 Transcript§ AATAAA 1893 or 2 674 1 893 or 2674 2674 2720 2720 458? 2502 3310 3418 8001 11881 * Sequence numbers for UL1 to UL5 refer to Fig. 2, and for UL53 to UL56, RL2 and RL1 refer to Fig. 3. 1"The locations of proposed protein coding regions are given, from the first residue in the translation initiation codon to the last residue preceding the stop codon at the end of the exon. Leftward oriented genes are marked C. The 5' terminus of UL54 m R N A is from Whitton et al. (1983). Other figures are tentative, based on features of the D N A sequence or on HSV-I data. § The location of the polyadenylation-associated sequence (AATAAA or ATTAAA) proposed for each transcript is indicated by the position of the 5' residue in the sequence; the actual 3' terminus of the transcript would then be 20 to 30 nucleotides downstream. IIThe 5'-terminal regions of the UL5 and UL53 ORFs lie outside the determined sequences. between genes UL53 and UL54, which contains regulatory sites for transcription of the immediate early gene UL54 (Whitton et al., 1983). Part of this section, together with the non-coding sequence at the 3' end of gene UL54, has been sequenced previously by Whitton et al. (1983) for strain HG52. There are some small differences in the two versions. Comparison of the HSV-2 UL56 sequence with its HSV-1 counterpart revealed an apparent frameshift adjacent to the 3' end of the H S V - 1 0 R F . Additional Gene (88.9) Protein : function or properties Probable syn-associated ; hydrophobic N terminus U r a c i l - D N A glycosylase Unknown; hydrophobic N terminus Unknown Component of D N A helicase-primase complex Syn-associated membrane glycoprotein (gK) Transcriptional regulator (Vmw63, ICP27) Unknown Unknown; hydrophobic C terminus Transcriptional regulator (Vmw118, ICP0) Neurovirulence factor (ICP34.5) * Mr for unprocessed polypeptide chain. t Presents percentage of identical aligned residues after alignment with the corresponding HSV-1 sequence using the Gap program. :~ Incomplete sequences for UL5 and UL53. analyses of both sequences confirmed that the HSV-2 sequence was correct, but that in determining the HSV-1 sequence a compression had been incorrectly resolved, so that the published version of the HSV-1 UL56 sequence contains an error (Perry & McGeoch, 1988; McGeoch et al., 1988). To correct this, ' C G ' should be inserted after residue 116343 of the complete HSV-1 sequence as numbered by McGeoch et al. (1988): that is, residues 116337 to 116351, A G C C G C G C C G C G C G T , become AGCCGCGCGCCGCGCGT. The effect of this change on the amino acid sequence predicted from the HSV-1 UL56 gene is to remove two amino acids from the C terminus and add 40, to yield a total length of 234 amino acids. Both the revised HSV-1 sequence and the HSV-2 sequence possess an uncharged and highly hydrophobic section of 18 amino acids immediately adjacent to the C-terminal Arg residue (see Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 D. J. McGeoch and others 3066 (a) } , i r I I I I J i I J / (b) 5 i i i i I i L t ~ L ~ J L J ~ L I ~ i ,J -/ ./ / / ;7 ] <I, > ~2- " / / ]/ // . . . " '" . . . • ¢ / ~3- i ./ - i / v '1- ).,• . /" / 4- ,-7,3 + k . I . 10 ' J ll ULI , , 11 . . . . 12 HSV-I (kb) UL3 [ I 13 14 113 UL5 UL53 UL4 I ,~ L r I • . . . . I'' ' ' 114 ' l . . . . [ ' ' ' ' I 115 116 HSV-I (kb) UL54 ' 117 UL55 UL56 UL2 (d) (c) 14 • . ~"i ;,, , . I , , , ~ I , , , , I , , , , I , , , , I , , .., . ,,,,<~ ~.~.! ' ~ I , .... .] • • ' . . , ~ . . -- 2 .- 2 12- "~' ~ '~ " " . }/;.! ¢-q c~ 7 ¸ ;> "..i -_ .. ..~ ...':2' ~ t' .'. 2. • .;. */ "' '6,, i. • , 10 • :. , i . . . ~3. ".,-';;. • ":. :!-:)~ t ,,,it ....."*"" : -'1 118 t+ .. : ¢" ? 119 .. • ¢- • : j; " * / .. " ; . +, /. ~'II" "7 ' "t+ +.. 'f ., i • " , • .*.'j., ¢ -- I..t, .'~'..;, ~ f¢ ' :~: - 120 HSV-I (kb) 122 121 RL2 122 123 RL2 V 124 125 HSV-I (kb) V 126 RLI Fig. 4. Comparison of HSV-1 and HSV-2 sequences at the extremities of UL and in IR L. The four panels were produced using G C G programs Compare and Dotplot. Parameters for Compare were: window, 35; stringency, 22. Locations of H S V - 1 0 R F s are shown under each panel. (a) HSV-2 T R L / U L sequence (Fig. 2) compared with HSV-l residues 9001 to 14 386. (b) HSV-2 U L/I R L sequence (Fig. 3) residues 1 to 5000 compared with HSV-I residues 112 782 to 117 781. (c) HSV-2 UL/IRL residues 4501 to 9500 compared with HSV-1 residues I 17282 to 122 281 (i.e. overlaps b by 500 residues in both dimensions)• (d) HSV-2 UL/IR L residues 9001 to 14000 compared with HSV-I residues 121 782 to 126781 (i.e. overlaps c by 500 residues). Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 Sequence of the HSV-2 long repeat 3067 Table 3. Proposed splice donor and acceptor sequences in HSV-2 RL* Donor sequences Consensus L A T (6591-6599) RL2 intron 2 (10158-10150 C) RL2 intron 1 (11244 11236 C) RLI (12687-12679 C) b C A A A a A C C A g G G G G GT GT GT GT GT r a g t A GGT GA GG G A GA A C GC Acceptor sequences Consensus L A T (8793 8808) RL2 intron 2 (9974~9959 C) RL2 intron 1 (10834-10819 C) RL1 (12530-12515 C) y C T C T y y T C GT GT CT y y y y y GT CT C C T C C C T GCT T C C C CGC y y y T C C T CT T GT C C y A G C C AG GC AG C AA G GC A G g C T C G * Splice donor and acceptor consensus sequences are from Mount (1982). Partially conserved sites are in lower case; 'y' and 'r' represent pyrimidine and purine nucleotides respectively; 'b' represents 'c' or 'a'. Splice positions are marked . N u m b e r s for HSV-2 sequences are as in Fig. 3, with leftward 5' to 3' sequences indicated by C. Fig. 3); this structure could well constitute a transmembrane anchor domain, although there is no indication of a corresponding N-terminal hydrophobic signal sequence in either instance. In other respects the HSV-2 data support the interpretation of the HSV-1 UL56 gene, which was previously considered somewhat tentative. Two possible ATG codons upstream of the assigned HSV-1 UL56 ORF and out-of-frame with it are not conserved in HSV-2. HSV-2 possesses an in-frame ATG 10 codons upstream of the start site shown in Fig. 3, which could form an alternative translational start. Organization of the HSV-2 immediate early gene in R L In analysing the sequence of HSV-2 RL there are three topics to be addressed: function of the region between UL and the RL2 gene, part of which is transcribed into LAT species; organization of the immediate early gene (RL2) encoding IE118 (counterpart of HSV-1 IE110); and function of the region between the RL2 gene and the a sequence. Since the RL2 gene is the best characterized entity in HSV-I RL, its HSV-2 counterpart is dealt with first. The HSV-1 immediate early gene, encoding the transcriptional modulator IE110 or ICP0, is considered to possess three exons, all containing coding sequences, and to have an extensive upstream transcriptional regulatory region (Perry et al., 1986; Mackem & Roizman, 1982). DNA sequences in HSV-2 RL related to the HSV-1 IE110 coding sequences were readily located (see Fig. 4c and d). Appropriately positioned counterparts for transcriptional regulatory and polyadenylationassociated elements also exist. The HSV-2 regions corresponding to the two HSV-1 introns were not conserved in size or sequence, but were bounded by appropriately located potential splice donor and splice acceptor sequences, as shown in Table 3. It therefore is reasonable to propose that these regions (residues 11 241 to 10835 and 10155 to 9975 in Fig. 3) are introns in the HSV-2 RL2 gene; the lack of sequence conservation of the proposed introns with their HSV-1 counterparts is in accord with properties of introns in general. The sequence data are thus thoroughly consistent with the HSV-2 RL2 gene having an organization closely similar to that of the HSV-1 gene, as shown in Fig. 3. Nonetheless, authentication by direct transcript mapping should be undertaken. Two sets of tandemly reiterated sequences occur within the HSV-2 gene. Family 4 almost fills the downstream intron, whereas family 3 occurs in exon 3 and is thought to encode protein (see below). The predicted amino acid sequences for HSV-1 IE110 and HSV-2 IE118 are aligned in Fig. 5, and present some interesting features. In the sequence encoded by the second exon, the region of conserved Cys and His residues originally noted for the homologous proteins of HSV-1 and varicella-zoster virus (Perry et al., 1986) is present also in HSV-2 (residues 126 to 166). Characteristically similar sequences have also been noted recently in certain non-herpesviral proteins (Freemont et al., 1991). The central part of the HSV-2 I E l l 8 sequence (representing approximately the first 394 amino acids encoded by exon 3; residues 252 to 645) is relatively poorly conserved in comparison to the regions encoded by exon 2 and the distal part of exon 3. The sequences in this central region are notably hydrophilic. At residues 589 to 627 in the HSV-2 protein there is a set of seven copies plus one partial copy of the sequence Ala-(Ser)4; this is encoded by repeat family 3 (see Fig. 3). The Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 3068 D. J. McGeoch and others < Exon 1 > < Start of exon 2 MEPRPGASTRRPEG..RPQRE ...... PAPDVWVFPCDRDLPDSSDSEAETEVGGRGDADHHDDDSASEADSTDTELFETGLLGPQGVDG..GAVSGGSP MEPRpGTSSRAD•GPERPPRQTPGTQPAA•HAWGMLNDMQWLASSDSEEETEVGISDDDLHR••DSTSEAG•TDTEMFEAGLMDAATPPARPPAE•QGSP MEpRPG-S-R---G--RP-~ ........ ~P--W .... D ..... SSDSR-ETEVG---D--H---DS-SEA-STDTR-FE-GL ........... < Exon 1 >< Start of exon 90 98 A---GSP 2 PREED•GSCGGAP•RED..GGSDEGDVCAvCTDEIAPHLRCDTFPcMHRFCIPCMKTWMQLRNTCPLCNAKLVYLIVGVT•SGSFSTIPIvNDPQTRMEA 188 TPADAQGSCGGGPVGEEEAEAGGGGDVCAVCTDEIAPPLRCQSFPcLHPFCIPCMKTWIPLRNTCPLCNTPVAYLIVGVTASGSFSTIPIVNDPRTRVEA ...... GSCGG-P--E ........ GDVCAVCTDEIAP-LRC--FPC-H-FCIPCMKTW--LRNTCPLCN .... YLIVGVT-SGSFSTIPIVNDP-TR-EA 198 End of exon 2 >< Start of exon 3 EEAVR•GTAVDFIWTGNQRFAPRYLTLGGHTVRALSPTHPEPTTDEDDDDLDDADYVPPAPRRTPRAPPRRGAAAPPVTGGASHAAPQPAAARTAPPSAP EAAVRAGTAVDFIWTGNPRTAPRSLSLGGHTVRALSPTPPWPGTDDEDDDLADVDYVPPAPRR .... APRRGGGG .... AGATRGTSQPAATRPAPPGAP E-AVRAGTAVDFIWTGN-R-APR-L-LGGHTVRALSPT-P-P-TD--DDDL-D-DYVPPAPRR ..... PRRG ........ GA ..... QPAA-R-APP-AP End of exon 2 >< Start of exon 288 290 3 IGPHGSSNTNTTTNSSGGGGSRQSRAAAPRGASGPSGGVGVG•GV..VEAEAGRPRGRTGPLVNRPAPLANNRDPIVISDSPPASPHRPPAAPMPGSAPR 386 RSSSSGGAPLRAGvGSGSGGGPAvAAVV•RvASLPPAAGGGRAQARRVGEDAAAAEGRTPPA...RQPRAAQEPPIvISDSPPPSPRRPAGPGPLSFVSS ............... SG-GG ..... A--PR-AS-P .... G ....... V---A .... GRT-P ...... P-A .... PIVISDSPP-SP-RP PGPPASAAASG ......... PARPRAAVAPCVRAPP ..... 387 ........... PGPGPRAPAPGAEPAARPADARRVPQSHSSLAQAANQEQSLCRARATVARGSGGPGVEG 472 SSAQVSSGPGGGGLPQSSGRAARPRAAVAPRVRS•pRAAAAPwSASADAAGPAPPAVPVDAHRAPRSRMTQAQTDTQAQSLGRAGATDARGSGGPGAEG ..... S .... G .......... ARPRAAVAP-VR-PP .... P ..... A-A-G--P-A-P-DA-R-P-S .... AQ---Q-QSL-RA-AT-ARGSGGPG-EG GSGPSRGAAPSGAAPLPSAASVEQEAAVRPRKRRGSGQE GPGVPRGTNTPGAAPHAA G-G--RG .... GAAP ....... ...... NPSPQSTRPPLAP..AGAKRAATHPPSDSGPGGRGQG ..... EGAAARPRKRRGSDSGPAASSSASSSAAPRSPLAPQGVGAKRAAPRRAPDSDSGDRGHGPLAPASAGAAPPSASPSS .......... AA-RPRKRRGS .......... S .... R-PLAP---GAKRAA ..... DS--G-RG-G SAASASSSSASSSSAPTPAGAASSA..AGAASSSASASSGGAVGAL ....... GPGTPLTS... ....... G---P--E--- GGLTRYLPISGVSSVvALSPYVNKTITGDCLPILDMETGNIGAYVvLvDQTGNMATRLRAAVPGWSRRTLLPETAGNHVMPPEYPTAPASEWNSLWMTPv PGLTRYLPIAGVSSvVALAPYVNKTVTGDCLPVLDMETGHIGAYVvLVDQTGNVADLLRAAAPAWSRRTLLPEHARNcVRPPDYPTPPASEWNSLWMTPv 554 582 ..... GGRQEETSLGPRAASGPRGPRKCARKTRHAETSGAV QAAVAAASSSSASSSSASSSSASSSSASSSSASSSSA•SSSASSSAGGAGGSVASASGAGERRETSLGPRAA•APRGPRKCARKTRHAEGGPEPGARDPA ....... S--SASSSSASSSSA ....... SSA .... A-SSSAS-S-GGA-G ........ G---ETSLGPRAA--PRGPRKCARKTRHAE < .... Reiterated sequence in HSV-2 ---> 487 .... PA 636 681 ......... PA 736 781 -GLTRYL•I-GVSSwAL-PYVNKT-TGDCLP-LDMETG-IGAYVVLVDQTGN-A--LRAA-P-WSRRTLLPE-A-N-V-PP-Y•T-PASEWNSLWMTPV GNMLFDQGTLVGALDFRSLRSRHPWSGEQGASTRDEGKQ 775 GNMLFDQGTLVGALDFHGLRSRHPWSREQGAPAPAGDAPAGHGE GNMLFDQGTLVGALDF--LRSRHPWS-EQGA . . . . . . . . . . . . . 825 Fig. 5. Alignmentof the HSV-1 I E 110 and HSV-2 IEI 18 amino acid sequences.The amino acid sequenceswere alignedand displayed using GCG programs Bestfitand Pretty. The HSV-l IE110 sequence is shown above the HSV-2 IE118 sequence.The location of the HSV-2 repeated amino acid sequence Ala-(Ser)4is indicated. aligned HSV-1 sequence is also serine-rich but is not perfectly reiterated. A basic region (residues 511 to 516) is conserved, which is proposed to form a nuclear localization signal (Everett, 1988). To summarize these comparisons, the two proteins have regions of about 200 residues adjacent to both their N and C termini which are well conserved in length and in identity of aligned residues; these are separated by a poorly conserved, hydrophilic region. These features could suggest a structure in which the N- and C-terminal regions form separate functional domains and are linked by an extended hydrophilic structure of mostly less critical functional importance. This view is broadly compatible with analyses of HSV-1 I E l l 0 function (reviewed by Everett et al., 1991). Examination of the sequence of R L lying between the immediate early gene and UL Between the downstream ends of the HSV-1 and HSV-2 RL2 genes and the UL/RL boundaries there are regions of 3.7 kbp and 3.9 kbp, respectively. In both HSV-1 and HSV-2, the 5' portions of LAT species are transcribed from within this location and overlap the RL2 genes (Wagner et al., 1988a; Mitchell et al., 1990a; see Fig. 1). A major aim of our HSV-2 sequence analysis was to use comparisons of D N A sequence in this region to gain insight into LAT organization and function. Fig. 4(c) shows in an overview manner that the HSV-1 and HSV-2 sequences in this region are significantly divergent. Some similar sequences do exist towards the UL end of the region, and effort was put into constructing an alignment of these, which proved a non-trivial task. The D N A s are G + C - r i c h and also contain many repetitive and simple elements; these factors give rise to the high background seen for the region in Fig. 4, and similarly they obscure attempts to discern genuinely homologous parts of the two sequences. Another factor is that convincingly homologous regions may be separated by other regions the sizes of which differ widely between the two DNAs. An alignment was produced for part of the region, as shown in Fig. 6. This was generated by first using plots such as those in Fig. 4 and making local alignments with the G C G Bestfit program to identify HSV-1 and HSV-2 sequences judged to be unambiguous, genuine homologues; these loci are indicated in Fig. 6. Alignments of flanking regions were then made with the Bestfit program. No convincing alignment was made between the UL/RL boundary, at residue 4356, and the start of the sequences in Fig. 6, at residue 4984. Between the main alignment in Fig. 6 and the downstream ends of the RL2 genes only one additional feature was judged worth registering: as shown separately at the foot of Fig. 6, this encompasses the mapped 5' terminus of HSV-1 LAT (Wagner et al., 1988a). Thus, although similarities exist upstream of and at Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 Sequence of the HSV-2 long repeat CCCACCCACCCCACGCCCCCACTGAGCCCGGTCGATCGACGAGCACCCCCGCCCACGCCCCCGCCCC ..... TGCCCCGGCGACCCCCGGCCCGCACGAT CCCACCTACCCCGCGCCCGCA..GCCTCCGGCAGCACGCCGACCACCGCCGCCACCCCCCAAACAGCCAAGGCGCGGTGGGGGGCGTGGTGGTGAACGAT CCCACC-ACCCC-CGCCC-CA--G---CCGG--G--CG-CGA-CACC-CCGCC--C-CCC---C--C ...... GC---GG-G--C---G CCCGACAACA .................... AAGGACGGGA.AGTGGAAGTCCTGATACCCATCCTACACCCCCCTGCCTTCCACCCTCCGGCCCCCCGCGAGTCC 5081 118332 5179 .... AACGG-GGATGG < ........... ........... ACCCGCCGGCCGGC AAGGGCAGAAGATGGGGAGTCCCGATCCTCCTCCTGCATCCCCTCGCCTTCCATTCTCCGGCCCTCCGCGAGTCCCGACGCCCCCCCCCCGCCGCCCGAC AAGG-C-G-A-A--GG-AGTCC-GAT-C-C-TCCT-CA-CCCC--GCCTTCCA--CTCCGGCCC-CCGCGAGTCC ............ Conserved ...................... 118252 .... G-ACGAT ATAACAACCCCAACGGAAAGCGGCGGGGTGTTGGGGGAGGCGAGGAACAACCGAGGGGAACGGGGGATGG GGGGGGAACACGGGGGGGAGGGGTCCGGGGCGAGGCGGGCGGGCGAAGGAAGGGGGGGTGGTGGCGGCGGCGGTGGAAAGCGGAAA..AACGGAGGATGG ---G--AACA ...................... A ...... C---CG-A .... GG-GGGGTG-TGG-GG-GGCG--G-A-A-C-GA Locus 1 ................................. 118428 5279 CCCGCCG-CCG-C > < ............ TACCGAGACCGAA.CACGGCGGCCGCCGCAGCCG .................................................................. GAAGGAGACCCAAGCACCGCAGCCGGAGAGGCCGAGCGGGGAGTGGGCGGCCGGGCGGGAGGATGGCGGAGAGAGAGAGAGAGAGAGAGAGAGGGGGGGG -A--GAGACC-AA-CAC-GC-GCCG--G--GCCG - Conserved Locus 2 ............. 118453 5379 .................................................................. > .............................................................................................. CCGCAG GGGGGAGAGGGAAAGCAACGGGAAAGAGAGGcGCGCGGAAAAGCAGCAAGAGGGGGGACGGGGCGAGCCGGGCAGAGTGCGGAGCCCCCGGAGCCCGCGG .............................................................................................. CCGC-G CCGCCGCCGACACCGCAGAGCCGGCGCGCGCACTCACAAGC..GGCAGAGGCAGAAAGGCCCAGA CCGCAGCCGA CCGC-GCCGA .... GCAGCGC .... GCAG-GC Conserved Locus ........................... GTCATTGT ..... CGCGGGCTCCGGGGCCGGGCCGGGCCGGCAACGCCCCGCGCCGGCCGCGGCGGAGAGAACCCCTGTGTCATTGT ..... CGCG--CTC .... GC--GGC-G-G-C-G-AA-GCCC-G ............................ 3 .......... 118459 5479 TTATGTGGCCGCGGGCCAGCAGACGGCCCGCG ............................ ACACCCCCCCCCCGCCCGTGTG TTACGTGGCCGCGGGCCAGCAGACGGGCCGCGGGCCAGCAGACGGGCCGCGGCGCCAGCGGCCCACGCCTCCCGCCGCATTAGGCCCCCGCGGGCATCCG TTA-GTGGCCGCGGGCCAGCAGACGG-CCGCG ............................. C-C-C-CC-CCCGCC .... -- 3069 118530 5570 GTCATTGT < ....... .......... GGTATCCG T ........... 118592 5670 GG-ATCCG > < .... GCCCCCCGCCCCGCGCCGGTCCATTAAGGGCGCGCGTGCCCGCGAGATATCAATCCGTTAAGTGCTCTGCAGACAGGGGCACCGCGCCCGGAAATCCATT GCGGCCGGCCCCACGCCCTTCCATTAAACACTCCCACGTTGGGGGGGGGCGCGCCAGCTGAGTGCTCTGCGGTTGCGGGCGCCGTGCCCGGAGATCCATT GC--CC-GCCCC-CGCC--TCCATTAA---C-C-C--G---G-G-G ........ C-G-T-AGTGCTCTGC-G .... GGGC-CCG-GCCCGGA-ATCCATT - Conserved Locus 4 ...... > < ................ Conserved Locus 5 118692 5770 .... AGGCCGCAGACGAGGAAAATAAAATTACATCACCTACCCACGTGGTGCTGTGGCCTGTTTTTGCTGCGTCATCTCAGCCTTTATAAAAGCGGGGGCGCGG 118792 AAGCCGCCGGAGAGCCCGAGC ........... A-GCCGC-G--GAG .... A ............. ............. CCGT CCCGCCCGCGTGTTGCTGTGGGC.ATTTCTGCTGCGTCATCCCTGTCTTTATAAAACCGGGGGCGCGG CC--CCC-CGTG-TGCTGTGG-C--TTT-TGCTGCGTCATC-C-G-CTTTATAAAA-CGGGGGCGCGG > .................... Conserved < ....................... Locus 6 5858 ..................... GCCGATCGCGGGTGGTGCGAAAGACTTTCCGGGCGCGTCCGGGTGCCGCGGCTCTCCGGGCCCCCCTGCAGCCGGG 118872 CAGCAACGAACGCAGGGGCCCGCCGCCGATCGAGAGGGACTCCGGAGAAGGAAGGCTGCTCCGCGCACCGGCGCGCC•TTCTCCTCTcCCCTCCCTACCT C-G ..................... > GCCGATCG-G-G-G---C---AGA ...... G ........ CG ......... 5958 GC-CT-C GCGGCCAAGGGGCGTCGGCGACATCCTCCCCcTAAGCGCCGGCCGGCCGCTGGTCTGTTTTTTCGTTTTCCCCGTTTCGGGGGTGGTGGGGGTTGCGGTT CCCCCTCTCTTCCCCCTTTTTTCCCCCGCCTCCCGTCTTCTTCCGCGCCTCCGAGGGTCCGCCTCTTGCCTCGGGGACCCCCGGGCGGGCCGGGGCTTGG -C--C ....... C--C ........ CC--CC-C .... C--C--CCG--C .... G---GT ....... TT--C-C-G---C .... C-CCC--C ....... 118972 6058 .... G-G--GG--G--GC TCTGTTTCTTTAACCCGTCTGGGGTGTTTTTCGTTCCGTCGCCGGAATGTTTCGTTCGTCTGTCCCCTCACGGGGCGAAGGcCGCGTACGGCCCGGGACG CCGCCGAG ............................................................................. .... 119072 6081 GTGCGCCCCGGCCGG -C ................................................................................... GT-CG-CCCGG---G < ........ AGGGGCCCCCG•ACCGCGGCGGTCCGGGCCCCGTCCGGACCCGCTCGCCGGCACGCGACGCGAAAAAGGCCCCCCGGAGGCTTTTCCGGGTTCCCGGCCC 119171 AGGGGCCCCCGCACCTCGGCGGCc...GCCcCCTCCGGCGCCGCGCGTTCGCGAAAGGCGCGAAAGGGGCCCCC.GGAGGCTTTTTTCGATTCCCGGCCG AGGGGCCCCCG-ACC-CGGCGG-C---GCCCC-TCCGG--CCGC-CG---GC .... G-CGCGAAA--GGCCCCC-GGAGGCTTTT---G-TTCCCGGCCConserved Locus 7 ---> < .............. Conserved Locus 6177 8 ........ GGGGCCTGAGATGAACACTCGGGGTTACCGCCAACGGCCGGCCCCCGTGGCGGCCCGGCCCGGGGCCCCGGCGGACCCAAGGGGcCCC..GGCCCGGGGC 119269 GGGGTCCCGGGTAGCCGCCCGGCGCCGGGCGGAAGGCGTCCCCCGCCCGGCGGTCCGGCCCGGGCCCCCGGCGGAGCGCGGGGGCCCCGGGGCCCCGGGC GGGG-C---G-T---C-C-CGG--G ........ AA-G ..... CCC-C--GGCGG-CCGGCCCGGG-CCCCGGCGGA-C---GGGGCCCC--GGCCC-GGGC ---> < ................. Conserved Locus 9 ............... 6277 CCCACAACGGCCCGGCGCATGCGCTGTGGTTTTTTTTTCCTCGGTGTTCTGCCGGGCTCCATCGCCTTTCCTGTTCTCGCTTCTCCCC..CCCCCCTTCT CGCGC..CGG..CGGCGTTTCCGCGTTCCGTTTCTTCTCCCTCCCGGGCCGCCCCGCTCCCGGGCCCGACC C-C-C--CGG--CGGCG--T-CGC--T---TTT-TT-TCC ..... G--C-GCC--GCTCC---GCC---CC > I[ TCACCCCCAGTACCCTCCTCCCTCCCT TCCCCCGTCCCGCCGCGCCCCTTCCCT TC-CCC ...... CC---C-CC-TCCCT 119394 6396 [I II II HSV-I 119447 6575 119367 .... CTCGCCCCTTCCCTTCTCCTCGTCT .... CTCGC--CT-CCC--C-CC-C-TCT LAT 5'end 0 ......... ACGCCGCG .... TTTCCAGGTAGGTT.AG ACGCCGCGCGTTCTCGCAGGTAGGTTTAG ACGCCGCG ..... T--CAGGTAGGTT-AG 6369 > 119470 6603 Fig. 6. Alignment of HSV-1 and HSV-2 DNA sequences in the UL proximal part of IRL. The sequence alignment was produced as described in the text. Regions judged to be genuinely homologous are indicated as loci 1 to 9. The proposed LAT TATA box sequence is overlined. At the bottom of the figure is an alignment of the mapped Y-terminal position of HSV-1 LAT (now considered to represent a splice donor position) with the apparently corresponding HSV-2 sequence. See note on HSV-1 DNA numbering in Methods. The HSV-2 numbering is as in Fig. 3, the mapped 5' end of HSV-1 LAT, no similarity is seen for HSV-1 and HSV-2 D N A s within the portions o f the L A T coding sequences which do not overlap the RL2 genes. The 5' end of HSV-2 L A T has not been mapped precisely, but the HSV-2 sequence contains an appropriately located sequence which is closely similar to that around the mapped 5' end o f HSV-1 LAT. However, it has been suggested that transcription of HSV-1 L A T is initiated some 700 nucleotides 5' of this site, perhaps with the sequence T A T A A A A (residues 5840 to 5846) acting as a T A T A box (Wechsler et al., 1988 a, b; D o b s o n et al., 1989; Batchelor & O'Hare, 1990; Zwaagstra et al., 1990). If the L A T promoter is in this locality, then the conserved elements seen in HSV-2 in loci 4, 5 and 6 of Fig. 6 presumably include important parts o f the transcriptional regulatory signals. We have examined the L A T region of H S V - 2 D N A for signs of protein coding function. In summary, our Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 3070 D. J. McGeoch and others results were negative. For the HSV-2 LAT region outside the RL2 gene sequence, ORFs do exist, but are not similar to ORFs in the HSV-1 LAT region. The HSV-2 LAT region outside the RL2 gene sequence, like that of HSV-1, does not show signs of three nucleotide based bias in nucleotide composition, the presence of which is characteristic of most HSV-I coding sequences (Perry & McGeoch, 1988). These outcomes, together with the dissimilarity of the HSV-1 and HSV-2 sequences, are all consistent with there being an absence of extensive protein coding sequences in either LAT region outside of the RL2 gene. In the region where the LAT transcripts overlap the RL2 coding sequence (about 530 bp), the HSV-1 and HSV-2 DNA sequences are highly similar. We believe that this represents primarily the coding requirements of the RL2 genes. Putative encoded amino acid sequences in all three reading frames in the LAT orientation also show considerable similarities between HSV-1 and HSV-2, but the distribution of stop codons differs. A region of the HSV-1 DNA which has been discussed as a possible LAT protein coding sequence is the so-called ORF2 (Wagner et al., 1988a), which lies across the 3" end of the RL2 coding sequence. In HSV-2 the counterpart of ORF2 is disrupted by two stop codons within the region overlapping the RL2 coding sequence. We interpret this observation as indicating that ORF2 is not, at least in its entirety, a real functional entity. Recently, Doerig et al. (1991) have expressed the 3'-terminal 112 codons of HSV-1 ORF2 (which overlap the RL2 coding sequence) in Escherichia coli as a trpE fusion protein, raised antiserum to the product and shown that a protein reacting with the antiserum was detectable in neurons latently infected with HSV-1. These observations are not readily reconcilable with our sequence interpretations. One possibility is that, if part of ORF2 is indeed translated in neurons, the m R N A involved may be an as yet uncharacterized species. Dobson et al. (1989) suggested that HSV-1 LAT might in fact be an intron transcript of unusual stability excised from a larger transcript. This proposal is compatible with a number of properties of the LAT, namely its nuclear location, lack of polyadenylation, apparent lack of protein coding function and the fact that the mapped 5' terminus is located in a sequence which conforms excellently to the splice donor consensus (Fig. 6 and Table 3). In a recent paper, Farrell et al. (1991) tested this proposal by identifying a potential splice acceptor site in the locality of the 3' terminus of the LAT, transferring a copy of LAT D N A including putative splice sites into a plasmid-borne E. coli lacZ gene, and expressing this in tissue culture cells. They observed that the LAT sequence was indeed spliced out of the lacZ transcript. Features of the HSV-2 RL sequence are consistent with LAT being an intron sequence. First, there is an appropriately located candidate for a splice donor sequence at the 5' end of LAT (around residues 6591 to 6599 in Fig. 3 and 6; see Table 3). Second, the HSV-2 sequence within exon 3 of the IEl18 gene contains an appropriately located candidate for a splice acceptor sequence (around residues 8790 to 8808 in Fig. 3; see Table 3), which corresponds in its location to the HSV-1 acceptor sequence identified by Farrell et al. (1991). Third, as outlined above, the HSV-2 LAT region that does not overlap the RL2 gene does not exhibit characteristics of protein coding DNA. Last, in the same region the HSV-1 and HSV-2 sequences are markedly divergent, as has been seen for the introns in the US1 and US 12 genes of HSV-1 and HSV-2 (Whitton & Clements, 1984), and for the HSV-1 introns and their corresponding HSV-2 sequences in the RL2 genes as described above. If the LAT sequence is generated as an intron, then a transcript containing the flanking exons must also exist. We have evaluated the protein coding potential of such a transcript in the sequences 5' and 3' to the LAT intron. In the region between the proposed TATA box (locus 6 in Fig. 6) and the splice donor site, the HSV-1 sequence contains three potential ATG translation initiation codons, but HSV-2 possesses none, and reading frames are not conserved between the sequences. It is thus unlikely that this region encodes protein. The proposed LAT splice acceptor sites lie within the exon 3 protein coding sequences of the RL2 genes. In both cases the acceptors are just 5' (in the LAT orientation) of the sequences which in the opposing, RL2 gene orientation encode the serine-rich parts and the remainder of the poorly conserved central regions of the IE110 and IE118 polypeptides. The first potential ATG codons after the proposed LAT splice lie 653 and 552 nucleotides downstream in HSV-1 and HSV-2 respectively. We have not succeeded in identifying reading frames which in our view are likely to be genuine. Little is known about the structure of the proposed 'exon transcripts' of the LAT transcription unit; they could, of course, be subject to additional splicing. From the available data it is possible that in HSV-1 the 3' terminus of this transcription unit is adjacent to a polyadenylationassociated sequence, AATAAA, downstream of the IE175 gene within Rs (Mitchell et al., 1990b; Zwaagstra et al., 1990; see Fig. 1). Our HSV-2 sequence data for the Rs part of B a m H I g indicate that this sequence is also present in HSV-2 (not shown). Organization of the H S V - 2 equivalent of the HSV-1 RL1 gene encoding ICP34.5 Chou & Roizman (1986) and Ackermann et al. (1986) have produced evidence that in HSV-1 strain F a gene, Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 Sequence of the HSV-2 long repeat TTTAAAGCGGTGGCGGCGGGCAGCCCGGGCCCCCCGCG...GCCGAGACTAGCGAGTTAGACAGGCAAGCAC.TACTCGC~TCTGCA~GCACATGCTTGCCTGTCA~CTCTACCACCCC ** *** ** ** * ******** * * ** **** *** . ***** * ********** *** ** * **** 125875 ***** **** ****** ** **** cTT.AAGAGGGCCCGC~CACGGCCCGGGCTGCGCACGCCAGCCGGGACGGGTGAGTTCGCTAGGCAAGCACGGACTGGCGGTTACACGTGCATGCGTGCCGAGTGAACTCTCCCGCCCC M GGCACGCTC~TGTCT * * ***** .......................... A R R R . . CCATGGCCCGCCGCCGC ***** ********** ** R H ...... R G P R R P R P 13224 P G GACGCGCTCCGGCTCCGGGCCTAcGCcGAGCCCAGCCGCCCGCCATGTCCCGCCGCCGGGGTCCCCGCCGCCGGGGTCCCCGGCGCCGGcCGCGCCCCGGCGCTCCAGCCGTGCCGCGCC M S R R R G P R R R G P R R R P R P . . . . . . . . T ....................... G A V P T A Q S Q V T S /X ............. T P P . CGCCATCGCG~CCCCGCCGCCCCCGGCCGCCCGG~CC **** ** ** ***** **** * ** X ............. N S E IX ........ P A V R S P A G G P P P S C S L L L R Q W L H V P E S A CCGCCGGTGGGCCCCCGCCTTCTTGTTCGCTGCTGCTGCGCCAG~GC~CACGT~CCGAGTCCGCGTCCGACGACGAC ** *** ***** * ***** * ** *** S D ** * D S D A G A * P A A A P A P E A R P T A A A P R P R P P P P G V G P G G G A D P GCGCCAGAGGCCCGGCCCACCGCCGcCGCCCCCCGGCCCCGGCCCCCACCGCCCGGCGTGGGCCCGGGGGGCGGGGCTGACCCCTCCCACCCCC ***** ******* ** * **** * *** ** ** ** ** ***** ** ** * *** E R V P 13104 25 R IX P P ** ** P . . . . . . P 50 D D D D W P D GATGACGACGAC~GCCGGACA ** **** ** *** ** S P P P E p GCCCCCCGCCCGAGCCG ******** **** K 18 125803 ................ . . . . . ******* GCCCCCCGAGCGAGGGCGGGGGG~GGCGCCGGAGGCCCCGCACGCCGCGCCTGCCGCCGCCTGCCCCCCGCCGCCGCCGCGCAAGGAGCGCGGG S P P S E G G G K A P E A P H A A P A A A C P P P P P R A A TCCTCGCTCCTGCGGCGCTGGCTGCTGGTGCCCCAGGCGGACGACAGCGACGACGCGGACTACGCCGGCAACGACGACGCAGAGTGGGCG~CA S S L L R R W L L V P Q A D D S D D A D Y A G N D D A . . A ........ . 125706 .......................... . . . . . . . . . . . * P . . . . . . .................. * . ................ ***** ACGGGCGCCGTCCCAACCGCACAGTCCCAGGTAACCTCCACGCCCAACTCGG~CCCGCGGTCAGGAGCGCGCCCGCGGCCGCCCCGCCGCCGCCCC * ***** *********** * ***** * * * ** * ****** * ******** ********* CCGGCGCTCCAGCCGTGCCGCG~CCCGGCGCGCTCCC~CCGCAGACTCCCA~TGGTCCCTGCGTACGACTCGGGAACCGCGGTCGAGAGCGCGCCGGCCGCG P G A P A V P R P G A L P T A D S Q M V P A Y D S G T A V E ................ /\ .............. . X...................... A 3071 E W S A H 13000 60 84 125604 *** 12906 91 N P 121 125493 * ........................ . . . . . . . . G C 12810 123 X ...... P S R P F R L P P R L A L R L R V T A E H L A R L R L R R A G G E G A P E P P A 161 CCTCGCGCcCCTTCCGCCTTCCGCCGCGCCTCGCCCTCCGCCTGCGCGTCACCGCGGAGCAcCTGGCGCGCcTGCGCCTGCGACG•GCGGGCGGGGAGGGGGCGCCGGAGCCCCCCGCGA * ******* ******** * *** ** ** ** ****** *** ***** ************** ******* ** CGCAGCGCCCC P Q R P .IX T ...... CTTCCGCCCCACCTGGCGCTACGGCTGCGCACCACGACGGAGTACCTGGCGCGCCTGAGCCTGCGCCGG . L P P H L A L R L R T T T E Y L A R L S IX . . . . . . . ....... P A T ~ A IX . . . . . . . T P A T P IX ....... A T L R 125373 .................................. . . . . . . . . . . . R 12730 150 IX . . . . . P A R 177 CCCCCGCGACCCCCGCGACCCCCGCGACCCCCGCGACCCCCGCGCGG ......................................................................... * * ******** **** * * **** * * ** * */ Start of proposed HSV-2 intron ..CGGCGGCCCCCCGCG~CCCGCCCGCGGACGCGCCGCGCGGGAAGGTACGCC~CCCTCCGACCCCCTGACGCCCC~CGACCCCCTGACGCCCC~CGACCCCCTGACGCCCC~CG . R R P P A S P P A D A P R G K X ................. /X ................. X*** ********** ACCCCCTGACGCCCC~CGACCCCCTGACGCCCC~CGACCCCCTGACGCCCC~CGACCCCCGTGTC~CCCGCCCGCAGGTGTGCTTCTCGCCGCGCGTGCAGGTGCGCCATCTGGTG .......... /X ................. /X ................. /X ............. V C F S P of 12612 /X ................. /\ ....... V R F S P H V R V R H L GTGCGCTTCTCGCCCCACGTCCGGGTGCGCCACCTGGTG ................................................................................. End 125326 proposed HSV-2 * intron R V *** * ********* Q V R H L 165 V 190 125287 ****** 12492 178 V V W A S A A R L A R R G S W A R E R A D R A R F R R R V A E A E A V I G P C L G GTC~G~CTCGGCCGCCcGCCTGGCGCGCCGCGGCTCGTGGGCCCGCGAGCGGGCCGACCGGGCTCGGT~CGGCGCCGGGTGGCGGAGGCCGAGGCGGTCATCGG~CGTGCCTGGGG * ***** ******* ******** ** ** ** ** ***** ***************** * ************** ******* ****************** GCCTG~AGACGGCCGCGCGCCTGGCCCGACGGGGGTCCTGGGCGCGCGAGCGGGCCGACCGCGACCGGT~CGGCGCCGCGTGGCGGCGGCCGAG~GGTCATCGGACCGTGCCTGGAG A W E T A A R L A R R G S W A R E R A D R D R F R R R V A A A E A V P E A R A R A L A R G A G P A N S V I G P C 230 125167 ********** L * 12372 218 E - 248 CCCGAGGCCCGTGCCCGGGcC~CCGCGGA~CG~CCGGCG~CTCGGTCTAACGTTAC&CCCGAGGCGG~C~GGTCTTCCGCGGAGC~CCGGGAGCTCCGCACC~GCCGCTC *********** ** ******* ***** * * ** * ** * * * * ** **** CCCGAG~CCGAGC~GG~CCGA~CCGAGCCCGGGCCCACGAAGACGGCGGACCCGCGGAGGAGGAGGAGGCGGCGGCGGCGGCGCGCGGGTCCTCC~CGCCGcGGGCCCGGGCCGT P E A R A R A R A R A R A H E D G G P A E E E E A A A A A R G S S ~CGGAGAGACGA~GCAGGAGcCGCGCATATATACGCT~GAGCCA~CC~CCTCACAGGGCGGGCCGCCTCGGGGGCGGGAC~C~TCGGCGGCCGCCAGCGCGGCGGGGCCCG * * * * * ** * ** ** * **** * * *** * * * A *** A A * * G P G * * ** * * 125047 * 12252 258 R 124927 * CGGGCGGTCTA~GTTGAACCGGCGAGGGCGGCCTCGGCCGGCGGA~CCCGGAGC~CGAAGGTCTGCGCGAGGCCGCTC~CGAAGAGACGATGGGAGCCCCGCGTATATATCCGCGA R A V - * * * 12132 261 Fig. 7. Alignment of the HSV-1 and H S V - 2 R L 1 genes. The HSV-1 and H S V - 2 D N A sequences are shown, sta~ing with proposed R L 1 TATA boxes and including the whole of the RL1 coding regions. The upper line shows the HSV-1 D N A sequences. The sequences are ~ r the leeward 5' to 3' strands in IRL, but preserve the numbering used ~ r the rightward strands. See note on HSV-I numbering in Meth~s. HSV-2 numbering ~llows that in Fig. 3. Proposed e n c ~ e d amino acid sequences are shown in the single-letter code. The proposed HSV-2 intron is indicated. encoding a protein termed ICP34.5, is located between the RL2 gene and the a sequence in the same orientation as the RL2 gene. An ORF of 358 codons is proposed to encode ICP34.5 In the HSV-1 strain 17 sequence, however, there were 20 frameshifts within the bounds of this ORF, and no satisfactory alternative reading frame could be proposed, although the DNA sequence did display some characteristics of protein coding D N A (Perry & McGeoch, 1988). This was part of the background with which we undertook a comparative analysis of the HSV-2 RL sequence. However, the conflict has meanwhile been resolved. First, Chou & Roizman (1990) revised the HSV-1 strain F sequence to correct 19 of the 20 frameshifting differences from HSV-1 strain 17, and proposed as the ICP34.5 coding sequence an ORF of 263 codons (which was partially coincident with the previous candidate ORF). Second, re-examination of strain 17 showed that the sequence at the remaining frameshifting difference from strain F was correct in the plasmid clone employed, but that this clone was atypical and the sequences of other clones were compatible with the strain F version (A. Dolan, E. McKie, A. R. MacLean & D. J. McGeoch, unpublished data). Our HSV-1/HSV-2 comparisons for this region have therefore focused on evaluating whether HSV-2 possesses a counterpart of the HSV-1 ICP34.5 coding region. As can be seen from Fig. 4(d), there are similarities Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 3072 D. J. M c G e o c h and others between the two sequences in the locality of the ICP34.5 ORF. Aligning the sequences was complicated by the occurrence of tandem reiterations, G + C-rich sequences and a number of addition/deletion differences. The sequences were aligned in the way described above for the UL-proximal part of RL. From these exercises it was possible to propose a coding sequence for an HSV-2 counterpart of ICP34.5. The coding regions of the HSV-1 and HSV-2 RL1 genes are aligned in Fig. 7, together with the proposed encoded amino acid sequences. As is generally the case in comparisons of substantially divergent DNA sequences, there exist several near equivalent variations of the alignment: the version shown in Fig. 7 is one which is compatible also with optimizing the alignment of the proposed encoded amino acid sequences, and which places padding insertions at codon boundaries. The HSV-2 RL1 coding sequence starts at an ATG (residues 13179 to 13 177 in Fig. 3 and 7) aligned with the HSV-1 initiator ATG proposed by Chou & Roizman (1990). Both HSV-1 and HSV-2 possess an upstream ATG; in HSV-2 this is at residues 13251 to 13249 (Fig. 3 and 7) and is blocked by a stop codon after four codons. The HSV-2 coding sequences are interrupted by a set of repeated sequences (family 5) consisting of six complete copies and one partial copy of a 19 nucleotide element which includes a stop codon, TGA, in the RL1 orientation; all three reading frames are thus blocked. We consider that this repeat family must lie within an intron in HSV-2 RLI : as indicated in Fig. 3 and 7, and in Table 3, it is closely flanked by excellent candidates for splice donor and acceptor sites, use of which would bring the proposed HSV-2 coding sequence back into frame with the distal portion of the HSV-1 R L 1 0 R F . This interpretation of the HSV-2 RL1 gene is supported by the distribution of G + C residues in the first, second and third positions of the proposed codon set (G + C content is higher for the third position), by the pattern of substitutions observed between HSV-I and HSV-2 (substitutions are most frequent in the third position of codons), and by the similarity between the encoded amino acid sequences (62.7 ~o identity of aligned residues). The high incidence of addition/deletion changes for the 5' portions of the ORFs would be unusual for HSV-1 and HSV-2 genes in the UL or Us regions, but is similar to that observed for parts of the immediate early genes in RL (see above) and in Rs (unpublished data). The HSV-2 DNA sequence has a TATA box candidate sequence aligned with that proposed by Chou & Roizman (1986) to act in transcription initiation of the HSV-1 RL1 gene (see Fig. 7), so the 5' end of the HSV-2 RL1 transcript may be adjacent to this. Similarly, the HSV-2 transcript may terminate downstream of the possible polyadenylation associated sequence ATTAAA at residues 11881 to 11 876 (Fig. 3). Authentication of the HSV-2 RL 1 transcript structure, including the proposed intron, will require direct mapping analyses. The HSV-2 RL1 gene is predicted to encode a protein of 261 amino acids. Like its HSV-1 counterpart this protein is basic with a high content of arginine residues. The most similar region in the two proteins is near the C terminus (corresponding to the second exon of HSV-2 RL1), in which 63 amino acids in each are aligned without introduction of gaps, and show 8 3 ~ identity (Fig. 7). Discussion The sequence data described in this paper for the genes at the extremities of HSV-2 UL show that these regions of the HSV-2 genome are very similar in sequence organization and coding capacity to the corresponding parts of HSV-1 DNA. This finding is similar to published reports for a number of other genes in the unique regions of the HSV-2 genome, both UL (for example, genes ULI1 and UL12, Draper et al., 1986; UL23, Swain & Galloway, 1983; UL27, Stuve et al., 1987; UL30, Tsurumi et al., 1987; UL39, Swain & Galloway, 1986; UL40, McLauchlan & Clements, 1983; UL44 and UL45, Swain et al., 1985) and Us (for example, genes US2 to US8, McGeoch et al., 1987). This relationship is in notable contrast to that found for the HSV-1 and HSV-2 RL regions. Here the sequences are much more diverged, and also exhibit features distinct from those in the unique regions, including a high incidence of short reiterated families and other simple sequences, and an elevated content of G and C residues. These features can be largely accounted for by proposing that a high level of recombination acts on the major repeats of the genome and that there is an associated bias towards raising the G + C content in mechanisms of generation or fixation of mutations; this was discussed previously with regard to the Rs element of HSV-1 (McGeoch et al., 1986). From comparisons of genome structures of HSV and other alphaherpesviruses, it seems probable that RL is the most recently evolved major element of the HSV genome; and it is evidently, on the time scale of the HSV-1/HSV-2 divergence, still in a state of rapid change. The amino acid sequences of HSV-2 UL and RL genes obtained from our DNA sequence analyses have not given direct new information on gene function, but they have contributed a number of refinements to interpretations based on the HSV-1 sequence alone. For the UL genes these include: the observation of possible signal sequences for membrane-associated translation in UL1 Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 Sequence o f the H S V - 2 long repeat and UL3; the re-interpretation of a possible translation initiation site for UL2; and the correction of the UL56 sequence with consequent identification of a possible transmembrane segment. The major aim of this paper was to evaluate the functions of the HSV-I and HSV-2 RL elements by comparative analyses. We consider that this has succeeded to the point that the potential of each part of RL can be described at least partially. From the overall divergence between the RL sequences, it is clear that a given region will have remained closely similar in RL of HSV-1 and HSV-2 only if it has a sequence-specific function. In the following paragraphs features of the various elements within RL are discussed in turn. First, adjacent to UL there is a region, of 630 bp in HSV-2 (Fig. 3), the sequence of which is not conserved between HSV-1 and HSV-2, and which contains a high level of repetitive and simple elements. In our view it is possible that this DNA has been generated by aberrant recombinational events and does not have a sequencespecific function. Second, next to this divergent section lie the sequences shown in Fig. 6, which exhibit extensive similarities in HSV-1 and HSV-2. In part, these related sequences probably represent functional elements in LAT transcriptional control. However, the similarities may be judged to be more extended than would be reasonable for only this function; there are no clues as to the nature of additional roles. Third, the region encoding the 5' part of the LAT, outside the RL2 gene, is not at all conserved in sequence. As outlined above, the LAT transcript probably is generated as an intron. Its unusual stability presumably represents some special features of the sequence, but details of the structures involved remain unexplored. The function of the LAT transcriptional unit is still not clear. The stable LAT intron could be the important component; most straightforwardly, this could have the role of helping to maintain the latent state by acting as an antisense repressor of RL2 translation, as suggested by several authors (and explored by Farrell et al., 1991). Alternatively, the LAT may be a by-product and the LAT 'exon transcript', of as yet uncharacterized coding capacity, may be the functional entity. This obscurity is c o m p o u n d e d b y t h e w o r k o f D o e r i g et al. (1991), s h o w i n g ;hat a p r o t e i n m a y be e x p r e s s e d f r o m p a r t o f t h e L A T r e g i o n in l a t e n t l y i n f e c t e d n e u r o n s . F o u r t h , t h e i m m e d i a t e e a r l y R L 2 g e n e is a c l e a r l y defined entity; including upstream control sequences, t h i s a c c o u n t s for s o m e 4300 b p o f RL. Last, t h e R L 1 g e n e a c c o u n t s for t h e r e m a i n d e r o f RL to t h e a s e q u e n c e . Regarding the structure of the HSV-2 RL1 gene, we c o n s i d e r t h e i n t e r p r e t a t i o n o f its c o d i n g r e g i o n to be r e a s o n a b l y secure, w i t h s o m e q u a l i f i c a t i o n r e g a r d i n g t h e 3073 5' terminus, which encodes basic, repetitive amino acid sequences. Knowledge of the structure of HSV-2 RL1 transcripts is, however, still incomplete and needs further work. We thank V. G. Preston and A. J. Davison for provision of HSV-2 HG52 clones, A. C. Minson and S. Efstathiou for HSV-2 DNAs, L. J. E. Kattenhorn for extensive help in preparing the text, and S. M. Brown, A. J. Davison and A. R. MacLean for reviewing the text. References ACKERMANN, M., CHOU, J., SARMIENTO, M., LERNER, R. A. & ROIZMAN,B. (1986). Identification by antibody to a synthetic peptide of a protein specified by a diploid gene located in the terminal repeats of the L component of herpes simplex virus genome. Journal of Virology 58, 843 850. BANKIER,A. T. & BARRELL,B. G. (1989). Sequencing single-stranded DNA using the chain-termination method. In Nucleic Acid Sequencing: A Practical Approach, pp 37-78. Edited by C. J. Howe & E. S. Ward. Oxford & New York: IRL Press. BATCHELOR, A. H. & O'HARE, P. (1990). Regulation and cell-typespecific activity of a promoter located upstream of the latencyassociated transcript of herpes simplex virus type 1. Journal of Virology 64, 3269-3279. CHOU, J. & ROIZMAN,B. (1986). The terminal a sequence of the herpes simplex virus genome contains the promoter of a gene located in the repeat sequences of the L component. Journal of Virology 57, 629-637. CHOU, J. & ROIZMAN,B. (1990). The herpes simplex virus 1 gene for ICP34.5, which maps in inverted repeats, is conserved in several limited-passage isolates but not in strain 17syn +. Journal of Virology 64, 1014-1020. CHOU, J., KERN, E. R., WHITLEY, R. J. & ROIZMAN, B. (1990). Mapping of herpes simplex virus-1 neurovirulence to Y134.5, a gene nonessential for growth in culture. Science 250, 1262-1265. CURRAN, J. & KOLAKOFSKY,D. (1988). Ribosomal initiation from an ACG codon in the Sendai virus P/C mRNA. EMBO Journal 7, 245 251. DAVISON,A. J. & WILKIE, N. M. (1981). Nucleotide sequences of the joint between the L and S segments of herpes simplex virus types 1 and 2. Journal of General Virology 55, 315-331. DEVEREUX,J., HAEBERLI,P. & SMITHIES,O. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Research 12, 387-395. DOBSON, A. T., SEDERATI, F., DEvI-RAo, G., FLANAGAN, W. M., FARRELL, M. J., STEVENS,J. G., WAGNER,E. K. & FELDMAN,L. T. (1989). Identification of the latency-associated transcript promoter by expression of rabbit beta-globin mRNA in mouse sensory nerve ganglia latently infected with a recombinant herpes simplex virus. Journal of Virology 63, 3844-3851. DOERIG, C., PIZER, L. I. & WILCOX,C. L. (1991). An antigen encoded by the latency-associated transcript in neuronal cell cultures latently infected with herpes simplex virus type 1. Journal of Virology 65, 2724-2727. DRAPER,K. G., DEVI-RAo, G., COSTA,R. H., BLAIR,E. D., THOMPSON, R. L. & WAGNER, E. K. (1986). Characterization of the genes encoding herpes simplex virus type 1 and type 2 alkaline exonucleases and overlapping proteins. Journal of Virology 57, 1023 1036. EVERETT, R. D. (1988). Analysis of the functional domains of herpes simplex virus type 1 immediate-early polypeptide Vmw110. Journal of Molecular Biology 202, 87-96. EVERETT,R. D., PRESTON,C. M. & STOW,N. D. (1991). Functional and genetic analysis of the role of Vmwll0 in herpes simplex virus replication. In Herpesvirus Transcription and its Regulation, pp 49-76. Edited by E. K. Wagner. Boca Raton: CRC Press. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 3074 D. J. McGeoch and others FARRELL, M. J., DOBSON, A. T. & FELDMAN, L. T. (1991), Herpes simplex virus latency-associated transcript is a stable introD. Proceedings of the National Academy of Sciences, U.S.A. 88, 790794. FREEMONT, P. S., HANSON, 1. M. & TROWSDALE, J. (1991). A novel cysteine-rich sequence motif. Cell 64, 483-484. GUPTA, K. C. & PATWARDHAN,S. (1988). ACG, the initiator codon for a Sendai virus protein. Journal of Biological Chemistry 263, 8553-8556. LEIB, D. A., BOGARD,C. L., KOsZ-VNENCHAK, M., HICKS, K. A., COEN, D. M., KNIPE, D. M. & SCHAFFER, P. A. (1989). A deletion mutant of the latency-associated transcript of herpes simplex virus type 1 reactivates from the latent state with reduced frequency. Journal of Virology 63, 2893-2900. LITTLE, S. P. & SCHAFFER,P. A. (1981). Expression of the syncytial (syn) phenotype in HSV-1, strain KOS: genetic and phenotype studies of mutants in two syn loci. Virology 112, 686-702. MCGEOCH, D. J., DOLAN, A., DONALD, S. & RIXON, F. J. (1985). Sequence determination and genetic content of the short unique region in the genome of herpes simplex virus type 1. Journal of Molecular Biology 181, 1-13. McGEOCH, D. J., DOLAN, A., DONALD, S. & BRAUER, D. H. K. (1986). Complete DN A sequence of the short repeat region in the genome of herpes simplex virus type 1. Nucleic Acids Research 14, 1727-1745. McGEOCH, D. J., MOSS, H. W. M., MCNAB, D. &FRAME, M. C. (1987). DNA sequence and genetic content of the HindlIl I region in the short unique component of the herpes simplex virus type 2 genome: identification of the gene encoding glycop/-otein G, and ev01utionary comparisons. Journal of General Virology 68, 19-38. MCGEOCH, D. J., DALRYMPLE, M. A., DAVISON, A. J., DOLAN, A., FRAME, M. C., MCNAB, D., PERRY, L: J., SCOTT,J. E. & TAYLOR,P. (1988). The complete DNA sequence of the long unique region in the genome of herpes simplex virus type 1. Journalof General Virology69, 1531-1574. MACKEM, S. & ROIZMAN, B. (1982). Structural features of the herpes simplex virus c~ gene 4, 0 and 27 proinoter-regulatory sequences which confer regulation on chimeric thymidine kinase genes. Journal of Virology 44, 939-949. MCLAUCHLAN, J. & CLEMENTS, J. B. (1983). DNA sequence homology between two colinear loci on the HSV genome which have different transforming abilities. EMBO Journal 2, 1953 1961. MITCHELL, W. J., DESHMANE, S. L., DOLAN, A., MCGEOCH, D. J. & FRASER, N. W. (1990a). Characterization of herpes simplex virus type II transcription during latent infection of mouse trigeminal ganglia. Journal of Virology 64, 5342-5348. MITCHELL, W. J., LIRETTE, R. P. & FRASER, N. W. (1990b). Mapping of low abundance latency-associated RNA in the trigeminal ganglia of mice latently infected with herpes simplex virus type 1. Journal of General Virology 71, 125 132. MIZUSAWA,S., NISHIMURA,S. & SEELA, F. 0986). Improvement of the dideoxy chain termination method of DNA sequencing by use of deoxy-7-deazaguanosine triphosphate in place of dGTP. Nucleic Acids Research 14, 1319-1324. MOUNT, S. M. (1982). A catalogue of splice junction sequences. Nucleic Acids Research 10, 459-472. MULLANEY, J., Moss, H. W. MCL. & MCGEOCH, D. J. (1989). Gene UL2 of herpes simplex virus type 1 encodes a uracil-DNA glycosylase. Journal of General Virology 70, 449-454. PERRY, L. J. & McGEOCH, D. J. (1988). The DNA sequences of the long repeat region and adjoining parts of the long unique region in the genome of herpes simplex virus type 1. Journal of General Virology 69, 2831-2846. PERRY, L. J., RIXON, F. J., EVERETT, R. D., FRAME, M. C. & MCGEOCH, D. J. (1986). Characterization of the I E l l 0 gene of herpes simplex virus type 1. Journal of General Virology 67, 2365-2380. ROCK, D. L., NESBURN, A. B., GHIASI, H., ONG, J., LEWIS, T. L., LOKENSGARD, J. R. & WECHSLER, S. L. (1987). Detection of latencyrelated viral R N A s in trigeminal ganglia of rabbits latently infected with herpes simplex virus type 1. Journal of Virology 61, 3820-3826. SAIKI, R. K., GELFAND, D. H., STOFFEL, S., SCHARF, S. J., HIGUCHI, R., HORN, G. T., MULLIS, K. B. & EHRLICH, H. A. (1988). Primer directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239, 487-491. SPIVACK, J. G. & FRASER, N. W. (1987). Detection of herpes simplex virus type 1 transcripts during latent infection in mice. Journal of Virology 61, 3841-3847. STADEN, R. (1982). Automation of the computer handling of gel reading data produced by the shotgun method of DNA sequencing. Nucleic Acids Research 10, 4731 4751. STEINER, I., SPIVACK,J. G., LIRETTE, R. P., BROWN, S. M., MACLEAN, A. R., SUBAK-SrIARPE,J. H. & FRASER, N. W. (1989). Herpes simplex virus type 1 latency-associated transcripts are evidently not essential for latent infection. EMBO Journal 8, 505-511. STEVENS, J. G., WAGNER, E. K., DEvI-RAo, G. B., COOK, M. L. & FELOMAN, L. T. (1987). RNA complementary to a herpesvirus alpha mRNA is prominent in latently infected neurons. Science 235, 1056-1059. STUVE, L. L., BROWN-SHIMER,S., PACHL, C., NAIARIAN, R., DINA, D. & BURKE, R. L. (1987). Structure and expression of the herpes simplex virus type 2 glycoprotein gB gene. Journal of Virology 61, 326-335. SWAIN, m. A. & GALLOWAY,D. A. (1983). Nucleotide sequence of the herpes simplex virus type 2 thymidine kinase gene. Journal of Virology 46, 1045-1050. SWAIN, M. A. & GALLOWAY, n . A. (1986). Herpes simplex virus specifies two subunits of ribonucleotide reductase encoded by Y-coterminal transcripts. Journal of Virology 57, 802-808. SWAIN, M. A., PEET, R. W. & GALLOWAY, n . A.:(1985). Chai'acterization of the gene encoding herpes simplex virus type 2 glycoprotein C and comparison with the type 1 counterpart. Journal of Virology 53, 561-569. TAHA, M. Y., CLEMENTS,G. B. & BROWN, S. M. (1989"a). A variant of herpes simplex virus type 2 strain HG52 with a 1.5 kb deletion in RL between 0 to 0.02 and 0.81 to 0.83 map units is non-neurovirulent for mice. Journal of General Virology 70, 705 716. TAHA, M. Y., CLEMENTS, G. B. & BROWN, S. M. (1989b). The herpes simplex virus type 2 (HG52) variant JH2604 has a 1488 bp deletion which eliminates neurovirulence in mice. Journalof General Virology 70, 3073-3078. TAHA, M. Y., BROWN, S. M., CLEMENTS, G. B. & GRAHAM,D. I. (1990). The JH2604 deletion variant of herpes simplex virus type 2 (HG52) fails to produce necrotizing encephalitis following intracranial inoculation of mice. Journal of General Virology 71, 1597-1601. TAYLOR, P. (1986). A computer program for translating DNA sequences into protein. Nucleic Acids Research 14, 437-441. TSURUMI,T., MAENO,K. & NISHIYAMA,Y. (1987). Nucleotide sequence of the DNA polymerase gene of herpes simplex virus type 2 and comparison with the type 1 counterpart. Gene 52, 129-137. WAGNER, E. K., DEvI-RAO, G., FELDMAN, L. T., DOBSON, A. T., ZHANG, Y.-F., FLANAGAN,W. M. & STEVENS,J. G. (1988a). Physical characterization of the herpes simplex virus latency-associated transcript in neurons. Journal of Virology 62, 1194-1202. WAGNER, E. K., FLANAGAN, W. M., DEvI-RAo, G., ZHANG, Y.-F., HILL, J. M., ANDERSON, K. P. & STEVENS, J. G. (1988b). The herpes simplex virus latency-associated transcript is spliced during the latent phase of infection. Journal of Virology 62, 4577-4585. WECHSLER, S. L., NESBURN, A. B., WATSON, R., SLANINA,S. & GHIASI, H. (1988a). Fine mapping of the major latency-related RNA of herpes simplex virus type 1 in humans. Journal of General Virology 69, 3101-3106. WECHSLER, S. L., NESBURN,A. B., WATSON, R., SLANINA,S. & GHIASI, H. (1988b). Fine mapping of the latency-related gene of herpes simplex virus type 1 : alternative splicing produces distinct latencyrelated RNAs containing open reading frames. Journal of Virology 62, 4051-4058. WHITTON, J. L. & CLEMENTS, J. B. (1984). The junctions between the repetitive and the short unique sequences of the herpes simplex virus genome are determined by the polypeptide-coding regions of the two spliced immediate-early mRNAs. Journal of General Virology 65, 451-466. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01 Sequence of the HSV-2 long repeat WHrvroN, J. L., RIXON, F. J., EhSTON, A. J. & CLEMENt'S,J. B. (1983). Immediate-early mRNA-2 of herpes simplex viruses types 1 and 2 is unspliced: conserved sequences around the 5" and 3" termini correspond to transcription regulatory signals. Nucleic Acids Research 11, 6271-6287. WORRAD,D. M. & CARADONNA,S. (1988). Identification of the coding sequence for herpes simplex virus uracil-DNA glycosylase. Journalof Virology 62, 4774-4777. 3075 ZWAAGSTRA, J. C., GHIASI, H., SLANINA, S. M., NESBURN, A. B., WHEATLEY, S. C., LILLYCROP, K., WOOD, J., LATCHMAN,D. S., PATEL, K. & WFCHSLER,S. L. (1990). Activity of herpes simplex virus type 1 latency-associated transcript (LAT) promoter in neuronderived cells: evidence for neuron specificity and for a large LAT transcript. Journal of Virology 64, 5019-5028. (Received 12 June 1991; Accepted 13 August 1991) Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Wed, 10 May 2017 20:36:01