* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Complete genomic sequence of viral hemorrhagic septicemia virus
Transposable element wikipedia , lookup
Magnesium transporter wikipedia , lookup
Biochemistry wikipedia , lookup
Western blot wikipedia , lookup
Interactome wikipedia , lookup
Biosynthesis wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Plant virus wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Expression vector wikipedia , lookup
Non-coding DNA wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Proteolysis wikipedia , lookup
Gene expression wikipedia , lookup
Genetic code wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Protein structure prediction wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Genomic library wikipedia , lookup
Point mutation wikipedia , lookup
Homology modeling wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Virus Genes 19:1, 59±65, 1999 # 1999 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Complete Genomic Sequence of Viral Hemorrhagic Septicemia Virus, a Fish Rhabdovirus È TZE,* EGBERT MUNDT, & THOMAS C. METTENLEITER HEIKE SCHU Institute of Molecular and Cellular Virology, Friedrich-Loef¯er-Institutes, Federal Research Centre for Virus Diseases of Animals, D-17498 Insel Riems, Germany Received December 18, 1998; Accepted January 21, 1999 Abstract. The complete nucleotide sequence of the ®sh rhabdovirus viral hemorrhagic septicemia virus (VHSV) has been determined. The genome comprises 11158 bases and contains six long open reading frames encoding the nucleoprotein N, phosphoprotein P, matrix protein M, glycoprotein G, nonstructural viral protein NV, and polymerase L. Genes are arranged in the order 30 -N-P-M-G-NV-L-50 . The exact 30 and 50 ends were determined after RNA-oligonucleotide ligation or RACE. They show inverse complementarity as in other rhabdovirus genomes. Nucleotide and deduced amino acid sequences exhibit signi®cant homology to corresponding sequences in the related ®sh rhabdovirus infectious hematopoietic necrosis virus. Key words: viral hemorrhagic septicemia virus, genomic sequence, ®sh rhabdovirus Introduction Viral hemorrhagic septicemia virus (VHSV) and infectious hematopoietic necrosis virus (IHNV) cause devastating diseases of salmonid ®sh (1). Both viruses exhibit similar biological properties such as growth temperature, cytopathology, induction of interferon synthesis and protein composition. They are classi®ed among the ``ungrouped'' rhabdoviruses within the order of Mononegavirales (2). Rhabdoviruses consist of a nonsegmented negativestranded RNA genome of approx. 9.000±11.000 nucleotides which is complexed with the nucleoprotein N, the phosphoprotein P M1 and the viral RNA polymerase L in a ribonucleoprotein (RNP) core. This RNP is surrounded by a lipoprotein bilayer membrane acting as viral envelope. The glycoprotein G is localized on the surface of the envelope whereas the matrix protein M M2 lines the inner side of the membrane. IHNV and VHSV also encode a nonstructural protein NV which is present in virus The complete nucleotide sequence of the VHSV genome (N, P, M, G, NV and L genes) has been deposited in GenBank and assigned the accession number Y18263. *Corresponding author. infected cells (3,4). The function of this small protein is so far unknown. Recently, the complete nucleotide sequence of the genome of IHNV has been determined (3,5). It has been shown that the gene order is 30 -N-P-M-G-NV-L-50 and that all genes are transcribed into monocistronic polyadenylated messenger RNAs (6,3). To proceed with the molecular investigation of another ®sh rhabdovirus, VHSV, we determined the complete genomic sequence of VHSV and compared it with that of other rhabdoviruses. Materials and Methods Viruses and Cells For virus propagation rainbow trout gonade (RTG) cells were infected with VHSV Fi 13 (F1 strain) at a multiplicity of infection of 0.1±1.0 at 15 C in a 2.5% CO2 atmosphere. Virions were puri®ed by centrifugation through a sucrose cushion and viral RNA was isolated with guanidinium thiocyanate and subsequent phenol-chloroform extraction (7) or centrifugation through a cesium chloride cushion (8,9). Isolated RNA was analyzed by electrophoresis in formalde- 60 SchuÈtze et al. hyde agarose gels followed by Northern blot hybridization with a 32P-labelled cRNA of a G-gene fragment. Cloning of the Complete Genome Coding regions for the N, G, P, and M proteins were ampli®ed by RT-PCR using speci®c primers derived from published sequences for the N- and G genes of VHSV. Synthetic nucleotides VN1 (nt position 92± 115 of the published sequence VHSVNP Acc. No. D00687; 10; corresponding to nt 168±190 of the presented complete sequence) and VN2 (reverse of nt 1283±1306 of the published sequence corresponding to nt 1359±1382 of the complete sequence) were used for cloning of the VHSV N gene. Synthetic oligonucleotides used for RT-PCR ampli®cation of the P and M gene region were deduced from published sequence data of the N (10) and G genes (11). The G gene of VHSV was ampli®ed using primers VG1 (nt position 463±483 of the published sequence VHSHSVM2 Acc. No. X59148; corresponding to nt 2964±2984 of the complete sequence) and VG2 (reverse of nt 1957±1981 of the published sequence corresponding to nt 4458±4482 of the complete sequence). RT-PCR ampli®cations were performed as described (3). Resulting products were blunt-ended with Klenow polymerase, phosphorylated with T4 polynucleotide kinase, and ligated into SmaI-cleaved dephosphorylated vector pSP73 (Promega). Cloning of the NV gene and part of the L gene was performed as described (4). To determine the complete sequence of the L gene, cDNA clones were obtained as described for cloning of the IHNV genome (3). Sequencing Synthetic oligonucleotides were deduced from obtained genomic sequences and used for primer walking. Sequences were determined by the dideoxynucleotide chain termination method according to standard protocols (Sequenase 7-deaza sequencing kit, USB/Amersham). At least six independently derived cDNA clones were analyzed on both strands to ascertain the obtained sequence. Sequences were assembled and analyzed with the Wisconsin Package Version 9.1, Genetics Computer Group (GCG), Madison, Wisc. (12). Determination of Genomic Termini The 30 -terminus of the VHSV genome was cloned after ligation of viral RNA with a synthetic oligonucleotide followed by RT-PCR as described for IHNV (3). The 50 -trailer region was determined by RACE (GIBCO BRL). Tailed cDNA was used for PCR ampli®cation with nested virus-speci®c primer and poly(C)- or poly (G) primers, respectively, and cloned into Sma I digested dephosphorylated vector pUC 18 (Pharmacia). At least six independently derived clones were sequenced in both orientations to determine the exact 30 and 50 termini of the genome. Results Determination of the Complete Nucleotide Sequence of the VHSV Genome The complete VHSV genome was cloned and sequenced. The cloning strategy is depicted in Fig. 1. The N, G, and P-M genes were sequenced after cloning of RT-PCR products obtained using speci®c oligonucleotides deduced from published sequences of the VHSV N and G genes. To verify determined sequences and to clone and sequence the complete L polymerase gene, cDNA clones were established as shown in Fig. 1. Termini were identi®ed after RNAoligonucleotide ligation followed by RT-PCR or by 50 -RACE, respectively. The VHSV genome consists of 11.158 nucleotides and contains six large open reading frames (ORFs) encoding the N, P, M, G, NV and L proteins. Gene order is identical to that found in IHNV, i.e. 30 -N-P-M-G-NV-L-50 . The VHSV genomic sequence is 55% identical to that of IHNV. ORF 1 The ®rst ORF is localized at position 168 to 1382 and codes for the nucleoprotein. The deduced amino acid sequence is predicted to specify a 44 kDa protein with 42% homology to the nucleoprotein of IHNV. Compared to published VHSV nucleoprotein sequences identity values of 98% to the pathogenic VHSV isolate 07±71, and 90% to strain MAKAH were found (10). Complete Genomic Sequence 61 Fig. 1. Schematic representation of genetic map and cloning strategy of the VHSV genome. Representative cDNA clones from VHSV genomic libraries are indicated. RT-PCR generated cDNA fragments, including clones of the 30 and 50 termini generated after RNAoligonucleotide ligation or 50 RACE are shown in boldtype. Numbers indicate the nucleotide positions in the VHSV genome. Only one representative clone of every cDNA cloning or RT-PCR is shown. ORF 2 The second ORF is located between nucleotide position 1481 and 2149 and encodes the phosphoprotein. A second ATG codon resides in-frame at nucleotide position 1496. Since the ®rst start codon is ¯anked by sequences characteristic for initiation of translation, it is assumed that this is the authentic translational start. Taking this into account, ORF2 codes for a 222 amino acid protein with a predicted molecular mass of 25 kDa. Identity of the P protein of VHSV-F1 to published sequences of other VHSV isolates amounts to 97% to isolate 07±71 and 92% to MAKAH (13). The VHSV P protein exhibits 37% identical amino acids to the respective protein of IHNV. Spiropoulou and Nichol reported the existence of an unique protein expressed from a second ORF which is contained in the P gene of vesicular stomatitis virus (VSV; 14). Translation of the nucleotide sequence of the VHSV genome identi®ed an additional second ORF overlapping the P gene at genomic nucleotide positions 1833±1973. Analysis of the IHNV genome gave similar results. The respective overlapping ORF is localized at nucleotide position 1559±1687. The hypothetical 46 and 42 amino acid VHSV or IHNV proteins have deduced molecular masses of approximately 5 kDa. A signi®cant homology of amino acid sequences could not be detected between either VHSV and IHNV or to VSV. ORF 3 The third gene which encodes the matrix protein starts at nucleotide position 2268 and ends at position 2873. Thus, it comprises 606 nucleotides. The calculated molecular mass of the deduced 201 amino acid polypeptide is 20 kDa. As observed for the P protein, homology of M protein to published sequences of other VHSV strains varies between 97% to isolate 07± 71 and 92% to MAKAH. Identity to the IHNV M protein amounts to 37%. 62 SchuÈtze et al. ORF 4 This ORF which is localized between nucleotides 2926 and 4482 encodes the glycoprotein G. A second in-frame start codon is present at position 2959 which resides in a perfect translation initiation context (15). Therefore, it is likely that the second ATG functions as the authentic translational start. Thus, a 507 amino acid protein with a calculated molecular mass of 57 kDa will be synthesized. The apparent molecular mass of the glycosylated form is 63 kDa. Comparison of deduced amino acid sequences with known sequence data of the G proteins of other VHSV strains (11) exhibited 99.8% identity. Identity to the G protein of IHNV is 39%. ORF 5 This small ORF of 369 nucleotides is located downstream from the glycoprotein gene G at position 4557±4925. The ®rst ATG codon is ¯anked by sequences characteristic for initiation of translation resulting in synthesis of a 122 amino acid protein of 13.7 kDa calculated molecular mass. ORF 5 encodes the NV protein which was recently detected in cells infected by either IHNV or VHSV using speci®c antisera (4). borna disease virus. In the rhabdovirus family the ®sh pathogenic viruses of VHS and IHN form a distinct clade separate from mammalian rhabdoviruses (VSV and RV) as shown in Fig. 2a. An alignment of the proposed four catalytic domains (20) of the VHSV, IHNV, RV, and VSV L proteins is shown in Fig. 2b. 30 and 50 Ends Inverse complementarity of termini is a common feature among genomes of nonsegmented negativestranded RNA viruses. It is essential to balance the processes of transcription and replication. As shown in Fig. 3, VHSV genome ends also exhibit this inverse complementarity. The VHSV genome starts at its 30 terminus with the sequence GTAT which is identical to the start sequence of the IHNV genome. It ends with ATAC at the 50 terminus. The 30 terminal leader sequence comprises 167 nucleotides from the start of the genome to the start of the ®rst open reading frame. (a) ORF 6 The last and largest gene on the VHSV genome encodes the viral RNA-dependent RNA polymerase. It starts at position 5053 and ends at position 11007. The deduced translation product consists of 1984 amino acids with a calculated molecular mass of 224 kDa. This is the ®rst complete sequence of the VHSV L protein gene. Identity to the deduced IHNV L protein is 60%. Alignment of the VHSV L protein sequence to deduced L proteins of other members of the Rhabdoviridae yields identities of 16.7% to vesicular stomatitis virus (VSV) L protein (16) and 25% to the L protein of rabies virus (RV; 17). Dendrogram analysis of L polymerases was generated by multiple sequence alignment program (PileUp, Genetics Computer Group package version 7.3.1Unix software) within the group of Mononegavirales (12). Dendrogram analysis data indicated a close relationship within the members of the Rhabdoviridae family and a more distant relation to the Paramyxoviridae and Filoviridae families and to the Fig. 2a. Comparison of complete L protein amino acid sequences within the rhabdovirus family. Included are VHSV, IHNV (infectious hematopoietic necrosis virus), VSV (vesicular stomatitis virus; 16) and RV (rabies virus; 17). a) The dendrogram shows the phylogenetic relationship of rhabdoviruses based on L protein sequences. Complete Genomic Sequence 63 (b) Fig. 2b. Alignment of rhabdoviral L polymerases. Only conserved regions containing the typical motifs A, B, C and D of L polymerases (20) are depicted. The strictly conserved residues of L proteins of rhabdoviruses are shown in boldface. Stars indicate those amino residues maintained in all RNA-dependent RNA polymerases. The presumably invariant residues within the L polymerase family are underlined. Numbers represent the amino acid positions of the respective L protein. Gaps are indicated by dots. Fig. 3. Comparison of terminal ends of the VHSV genome: a) The 30 and 50 genomic ends of VHSV are compared. Rhabdoviruses contain a single stranded RNA genome in antimessage ÿ sense. The sequences shown here represent the complementary strand. Inverse complementarity between the terminal ends of the VHSV genome is shown in bold type. Nucleotide numbers indicated the position on the genomic sequence. b) The 30 terminal sequence of VHSV is compared to the respective region of the IHNV genome in message sense. Homologous sequences are highlighted in bold type. Numbers are related to the nucleotide position of the complete respective genome. Gaps are indicated by dots. 64 SchuÈtze et al. Fig. 4. Comparison of conserved sequences in nontranslated regions within the VHSV genome. The consensus sequences within regions between the genes N and P (N-P), P and M (P-M), M and G (M-G), G and NV (G-NV), NV and L (NV-L) were compared in message sense. Polyadenylation signals are underlined and putative transcription initiation signals are shown in boldface. The determined consensus sequence for polyadenylation and initiation is shown below. The 50 terminal trailer sequence consists of 151 nucleotides. Nontranslated Regions The nontranslated regions between the different ORFs vary in length between 75 nucleotides between the G and NV genes, and 128 nucleotides between the NV and L genes. An alignment of sequences derived from nontranslated regions of the IHNV genome is shown in Fig. 4. At the 50 end of every sizeable ORF a conserved sequence 50 AGATWG(A)7YGGCAC(N)3TRT-30 is present. This regulatory sequence is very similar to the consensus sequence found in nontranslated regions of IHNV which is 50 -AGAYAS(A)7TGGCAC(N)4GTG, and to respective sequences in other rhabdoviruses such as RV (NTG(A)7) or VSV (TATG(A)7). Discussion In this report we present the complete genomic sequence of VHSV strain F1 (isolate Fil3) after cloning of the entire genome. The VHSV genome is 11.158 nucleotides in length which is slightly larger than the genome of the related ®sh rhabdovirus IHNV (3,5). The deduced genome organization is 30 -N-P-MG-NV-L-50 . Comparison of the VHSV genome and gene products shows a high homology to those of IHNV with identity values of deduced amino acid sequences between 37 and 60%. Only the NV protein exhibits a lower identity of only 23%. Deduced amino acid sequences revealed that VHSV and IHNV proteins are mainly neutral with the exception of the acidic nucleoprotein and the basic matrix protein. The phosphoprotein P is the most hydrophilic protein (18,19). Interestingly, the isoelectric points of the VHSV pI 9:9 or IHNV pI 8:4 M proteins differ signi®cantly from those of RV pI 4:84 or VSV PI 4:36. The functional basis for these differences is unclear at present. Within the P gene of VSV, an additional overlapping reading frame was detected (14) encoding a deduced protein C, which is localized in cytoplasmic compartiments of virus infected cells. In the VHSV and IHNV genomes, an additional second ORF contained in the P gene is also present. The deduced hypothetical 46 and 42 amino acid proteins also exhibit basic properties and are arginine rich as described for VSV. The isoelectric points of this additional protein of VHSV pI 11; 6 or IHNV pI 12; 8 are similar to the VSV pI 11; 6. However, it is unclear whether these potential proteins are expressed at all and, thus, it remains to be determined whether they are conserved within more members of the rhabdoviruses. As expected, the VHSV L protein exhibits a high degree of identity with other rhabdoviral L proteins. The catalytic subunits, a structural characteristic of the RNA-dependent RNA polymerases, could be identi®ed by alignment of VHSV L protein sequences with those of other rhabdoviral L proteins (20; Fig. 2b). The conserved domain consisting of four major motifs A, B, C, and D is localized between amino acids 560 and 770 of the VHSV L polymerase. Interestingly, in both ®sh rhabdoviral L polymerases a conserved glycine residue within domain D as found in RV and VSV is replaced by a proline. Alterations at the same position have also been described in other negative-stranded RNA viruses such as Bunya- and Complete Genomic Sequence Arenaviridae (20). The functional signi®cance of this change is unclear. Comparing length and nucleotide sequence of the termini, the VHSV and IHNV genomes also appear very similar. The 167 nt leader region of VHSV is 52% identical to the 174 nt leader of IHNV with a particularly high identity in the extreme 30 -ends (Fig. 3). The 50 -trailer region of VHSV is 40% identical to the respective region of IHNV. The determined nontranslated regions contain signals for termination of transcription and polyadenylation of mRNA, and signals for initiation of transcription of the downstream gene. The determined polyadenylation (AGATWG(A)7) and transcriptional initiation (AACA) sequences are similar to respective signals in RV and VSV genomes (21,22). However, in addition to the conserved core sequence additional sequences conserved between the two ®sh rhabdoviruses IHNV and VHSV were detected downstream from ORFs 1 to 5 (Fig. 4). The VHSV consensus sequence YGGCAC(N)3TRT is similar to that found in IHNV which is TGGCAC(N)4GTG. Both are probably involved in the initiation of transcription. Therefore, this sequence is not present downstream from the polyadenylation signal behind ORF 6 (L polymerase gene) but is found in 30 terminal region upstream from ORF 1 (with a change of one nucleotide). With the presentation of the complete genomic sequence of VHSV, the second ®sh rhabdovirus genome has been completely elucidated. Knowledge of the sequence of the viral genome and its gene content and composition is important for further studies on the function of viral proteins in the viral replicative cycle in cell culture and in the animal host. For a better molecular analysis of ®sh rhabdovirus infections, a reverse genetics system for both viruses is required, as has successfully been established for mammalian negative-strand RNA viruses such as RV and VSV (23,24). Availability of complete genomic sequences is a major prerequisite towards this goal which can then lead to generation of new vaccines with improved properties for use in aquaculture. 65 References 1. Wolf K., Fish viruses and ®sh viral diseases. Cornell University Press, Ithaca, 1988. 2. Murphy F.A., Fauquet C.M., Bishop D.H.L., Ghabrial S.A., Jarvis A.W., Martelli G.P., Mayo M.A., and Summers M.D., Arch Virol 10, 265±288, 1995. 3. SchuÈtze H., Enzmann P.-J., Kuchling R., Mundt E., Niemann H., and Mettenleiter T.C., J Gen Virol 76, 2519±2527, 1995. 4. SchuÈtze H., Enzmann P.-J. Mundt E., and Mettenleiter T.C., J Gen Virol 77, 1259±1263, 1996. 5. Morzunov S.P., Winton J.R., and Nichol S.T., Virus Res 38, 175±192, 1995. 6. Kurath G., Ahern K.G., Pearson G.D., and Leong J.C., J Virol 53, 469±476, 1985. 7. Chomzynski P. and Sacchi N., Analyt Biochem 162, 156±159, 1987. 8. Glisin V., Crkvenjakov R., and Byrus C., Biochemistry 13, 2633±2637, 1974. 9. Sambrook J., Fritsch E.F., and Maniatis T., Molecular cloning: A laboratory Manual. 2nd. edn. Cold Spring Harbor Laboratory, New York, 1989. 10. Bernard J., Lecocq-Xhonneux F., Rossius M., Thiry M.E., and DeKinkelin P., J Gen Virology 71, 1669±1674, 1990. 11. Thiry M., Lecocq-Xhonneux F., Dheur I., Renard A. and DeKinkelin P., Biochim Biophys Acta 1090, 345±347, 1991. 12. Devereux J., Haeberli P. and Smithies O., Nucl Acids Res 12, 387±395, 1984. 13. Benmansour A., Paubert G., Bernard J., and DeKinkelin P., Virology 198, 602±612, 1994. 14. Spiropoulou C.F. and Nichol S.T., J Virol 67, 3103±3110, 1993. 15. Kozak M., Nucleic Acids Res 15, 8125±8148, 1987. 16. Schubert M., Harmison G.G. and Meier E., J Virol 51, 505±514, 1984. 17. Tordo N., Poch O., Ermine A., Keith G. and Rougeon F., Virology 165, 565±576, 1988. 18. Baer G.M., Bellini W.J., and Fishbein D.B., Virology. ed. by B. N. Fields; D. M. Knipe. 2nd ed. New York: Raven Pr. 1990, pp. 883±942. 19. Wagner R.R., Virology. ed. by B. N. Fields; D. M. Knipe, 2nd ed. Raven Pr., New York, 1990, pp. 867±881. 20. Tordo N., de Haan P., Goldbach R., and Poch O., Sem Virol 3, 341±357, 1992. 21. Conzelmann K.-K., Cox J.H., Schneider L.G., and Thiel H.J., Virology 175, 485±499, 1990. 22. Rose J.K., Cell 19, 415±421, 1980. 23. Schnell M.J., Mebatsion T., and Conzelmann K.-K., The EMBO Journal 18, 4195±4203, 1994. 24. Lawson N.D., Stillman E.A., Whitt M.A., and Rose J.K., Proc Natl Acad Sci USA 92, 4477±4481, 1995.