Download Complete genomic sequence of viral hemorrhagic septicemia virus

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein wikipedia , lookup

Transposable element wikipedia , lookup

Magnesium transporter wikipedia , lookup

Biochemistry wikipedia , lookup

Western blot wikipedia , lookup

Interactome wikipedia , lookup

Biosynthesis wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Plant virus wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Expression vector wikipedia , lookup

Non-coding DNA wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Gene wikipedia , lookup

Proteolysis wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression wikipedia , lookup

Genetic code wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Protein structure prediction wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Genomic library wikipedia , lookup

Point mutation wikipedia , lookup

Homology modeling wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Transcript
Virus Genes 19:1, 59±65, 1999
# 1999 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Complete Genomic Sequence of Viral Hemorrhagic Septicemia Virus,
a Fish Rhabdovirus
È TZE,* EGBERT MUNDT, & THOMAS C. METTENLEITER
HEIKE SCHU
Institute of Molecular and Cellular Virology, Friedrich-Loef¯er-Institutes, Federal Research Centre for Virus Diseases of Animals, D-17498
Insel Riems, Germany
Received December 18, 1998; Accepted January 21, 1999
Abstract. The complete nucleotide sequence of the ®sh rhabdovirus viral hemorrhagic septicemia virus (VHSV)
has been determined. The genome comprises 11158 bases and contains six long open reading frames encoding the
nucleoprotein N, phosphoprotein P, matrix protein M, glycoprotein G, nonstructural viral protein NV, and
polymerase L. Genes are arranged in the order 30 -N-P-M-G-NV-L-50 . The exact 30 and 50 ends were determined
after RNA-oligonucleotide ligation or RACE. They show inverse complementarity as in other rhabdovirus
genomes. Nucleotide and deduced amino acid sequences exhibit signi®cant homology to corresponding sequences
in the related ®sh rhabdovirus infectious hematopoietic necrosis virus.
Key words: viral hemorrhagic septicemia virus, genomic sequence, ®sh rhabdovirus
Introduction
Viral hemorrhagic septicemia virus (VHSV) and
infectious hematopoietic necrosis virus (IHNV)
cause devastating diseases of salmonid ®sh (1). Both
viruses exhibit similar biological properties such as
growth temperature, cytopathology, induction of
interferon synthesis and protein composition. They
are classi®ed among the ``ungrouped'' rhabdoviruses
within the order of Mononegavirales (2).
Rhabdoviruses consist of a nonsegmented negativestranded RNA genome of approx. 9.000±11.000
nucleotides which is complexed with the nucleoprotein N, the phosphoprotein P…ˆ M1† and the viral
RNA polymerase L in a ribonucleoprotein (RNP)
core. This RNP is surrounded by a lipoprotein bilayer
membrane acting as viral envelope. The glycoprotein
G is localized on the surface of the envelope whereas
the matrix protein M…ˆ M2† lines the inner side of the
membrane. IHNV and VHSV also encode a nonstructural protein NV which is present in virus
The complete nucleotide sequence of the VHSV genome (N, P, M,
G, NV and L genes) has been deposited in GenBank and assigned
the accession number Y18263.
*Corresponding author.
infected cells (3,4). The function of this small protein
is so far unknown. Recently, the complete nucleotide
sequence of the genome of IHNV has been
determined (3,5). It has been shown that the gene
order is 30 -N-P-M-G-NV-L-50 and that all genes are
transcribed into monocistronic polyadenylated messenger RNAs (6,3). To proceed with the molecular
investigation of another ®sh rhabdovirus, VHSV, we
determined the complete genomic sequence of VHSV
and compared it with that of other rhabdoviruses.
Materials and Methods
Viruses and Cells
For virus propagation rainbow trout gonade (RTG)
cells were infected with VHSV Fi 13 (F1 strain) at a
multiplicity of infection of 0.1±1.0 at 15 C in a 2.5%
CO2 atmosphere. Virions were puri®ed by centrifugation through a sucrose cushion and viral RNA was
isolated with guanidinium thiocyanate and subsequent
phenol-chloroform extraction (7) or centrifugation
through a cesium chloride cushion (8,9). Isolated
RNA was analyzed by electrophoresis in formalde-
60
SchuÈtze et al.
hyde agarose gels followed by Northern blot
hybridization with a 32P-labelled cRNA of a G-gene
fragment.
Cloning of the Complete Genome
Coding regions for the N, G, P, and M proteins were
ampli®ed by RT-PCR using speci®c primers derived
from published sequences for the N- and G genes of
VHSV. Synthetic nucleotides VN1 (nt position 92±
115 of the published sequence VHSVNP Acc. No.
D00687; 10; corresponding to nt 168±190 of the
presented complete sequence) and VN2 (reverse of nt
1283±1306 of the published sequence corresponding
to nt 1359±1382 of the complete sequence) were used
for cloning of the VHSV N gene. Synthetic
oligonucleotides used for RT-PCR ampli®cation of
the P and M gene region were deduced from published
sequence data of the N (10) and G genes (11). The G
gene of VHSV was ampli®ed using primers VG1 (nt
position 463±483 of the published sequence
VHSHSVM2 Acc. No. X59148; corresponding to nt
2964±2984 of the complete sequence) and VG2
(reverse of nt 1957±1981 of the published sequence
corresponding to nt 4458±4482 of the complete
sequence). RT-PCR ampli®cations were performed
as described (3). Resulting products were blunt-ended
with Klenow polymerase, phosphorylated with T4
polynucleotide kinase, and ligated into SmaI-cleaved
dephosphorylated vector pSP73 (Promega).
Cloning of the NV gene and part of the L gene was
performed as described (4). To determine the
complete sequence of the L gene, cDNA clones
were obtained as described for cloning of the IHNV
genome (3).
Sequencing
Synthetic oligonucleotides were deduced from
obtained genomic sequences and used for primer
walking. Sequences were determined by the dideoxynucleotide chain termination method according to
standard protocols (Sequenase 7-deaza sequencing
kit, USB/Amersham). At least six independently
derived cDNA clones were analyzed on both strands
to ascertain the obtained sequence. Sequences were
assembled and analyzed with the Wisconsin Package
Version 9.1, Genetics Computer Group (GCG),
Madison, Wisc. (12).
Determination of Genomic Termini
The 30 -terminus of the VHSV genome was cloned
after ligation of viral RNA with a synthetic
oligonucleotide followed by RT-PCR as described
for IHNV (3). The 50 -trailer region was determined by
RACE (GIBCO BRL). Tailed cDNA was used for PCR
ampli®cation with nested virus-speci®c primer and
poly(C)- or poly (G) primers, respectively, and cloned
into Sma I digested dephosphorylated vector pUC 18
(Pharmacia). At least six independently derived
clones were sequenced in both orientations to
determine the exact 30 and 50 termini of the genome.
Results
Determination of the Complete Nucleotide Sequence
of the VHSV Genome
The complete VHSV genome was cloned and
sequenced. The cloning strategy is depicted in Fig.
1. The N, G, and P-M genes were sequenced after
cloning of RT-PCR products obtained using speci®c
oligonucleotides deduced from published sequences
of the VHSV N and G genes. To verify determined
sequences and to clone and sequence the complete L
polymerase gene, cDNA clones were established as
shown in Fig. 1. Termini were identi®ed after RNAoligonucleotide ligation followed by RT-PCR or by
50 -RACE, respectively. The VHSV genome consists
of 11.158 nucleotides and contains six large open
reading frames (ORFs) encoding the N, P, M, G, NV
and L proteins. Gene order is identical to that found
in IHNV, i.e. 30 -N-P-M-G-NV-L-50 . The VHSV
genomic sequence is 55% identical to that of IHNV.
ORF 1
The ®rst ORF is localized at position 168 to 1382 and
codes for the nucleoprotein. The deduced amino acid
sequence is predicted to specify a 44 kDa protein with
42% homology to the nucleoprotein of IHNV.
Compared to published VHSV nucleoprotein
sequences identity values of 98% to the pathogenic
VHSV isolate 07±71, and 90% to strain MAKAH
were found (10).
Complete Genomic Sequence
61
Fig. 1. Schematic representation of genetic map and cloning strategy of the VHSV genome. Representative cDNA clones from VHSV
genomic libraries are indicated. RT-PCR generated cDNA fragments, including clones of the 30 and 50 termini generated after RNAoligonucleotide ligation or 50 RACE are shown in boldtype. Numbers indicate the nucleotide positions in the VHSV genome. Only one
representative clone of every cDNA cloning or RT-PCR is shown.
ORF 2
The second ORF is located between nucleotide
position 1481 and 2149 and encodes the phosphoprotein. A second ATG codon resides in-frame at
nucleotide position 1496. Since the ®rst start codon
is ¯anked by sequences characteristic for initiation of
translation, it is assumed that this is the authentic
translational start. Taking this into account, ORF2
codes for a 222 amino acid protein with a predicted
molecular mass of 25 kDa. Identity of the P protein of
VHSV-F1 to published sequences of other VHSV
isolates amounts to 97% to isolate 07±71 and 92% to
MAKAH (13). The VHSV P protein exhibits 37%
identical amino acids to the respective protein of
IHNV.
Spiropoulou and Nichol reported the existence of
an unique protein expressed from a second ORF
which is contained in the P gene of vesicular
stomatitis virus (VSV; 14). Translation of the
nucleotide sequence of the VHSV genome identi®ed
an additional second ORF overlapping the P gene at
genomic nucleotide positions 1833±1973. Analysis
of the IHNV genome gave similar results. The
respective overlapping ORF is localized at nucleotide position 1559±1687. The hypothetical 46 and
42 amino acid VHSV or IHNV proteins have
deduced molecular masses of approximately 5 kDa.
A signi®cant homology of amino acid sequences
could not be detected between either VHSV and
IHNV or to VSV.
ORF 3
The third gene which encodes the matrix protein starts
at nucleotide position 2268 and ends at position 2873.
Thus, it comprises 606 nucleotides. The calculated
molecular mass of the deduced 201 amino acid
polypeptide is 20 kDa. As observed for the P protein,
homology of M protein to published sequences of
other VHSV strains varies between 97% to isolate 07±
71 and 92% to MAKAH. Identity to the IHNV M
protein amounts to 37%.
62
SchuÈtze et al.
ORF 4
This ORF which is localized between nucleotides
2926 and 4482 encodes the glycoprotein G. A second
in-frame start codon is present at position 2959 which
resides in a perfect translation initiation context (15).
Therefore, it is likely that the second ATG functions as
the authentic translational start. Thus, a 507 amino
acid protein with a calculated molecular mass of
57 kDa will be synthesized. The apparent molecular
mass of the glycosylated form is 63 kDa. Comparison
of deduced amino acid sequences with known
sequence data of the G proteins of other VHSV
strains (11) exhibited 99.8% identity. Identity to the G
protein of IHNV is 39%.
ORF 5
This small ORF of 369 nucleotides is located
downstream from the glycoprotein gene G at position
4557±4925. The ®rst ATG codon is ¯anked by
sequences characteristic for initiation of translation
resulting in synthesis of a 122 amino acid protein of
13.7 kDa calculated molecular mass. ORF 5 encodes
the NV protein which was recently detected in cells
infected by either IHNV or VHSV using speci®c
antisera (4).
borna disease virus. In the rhabdovirus family the ®sh
pathogenic viruses of VHS and IHN form a distinct
clade separate from mammalian rhabdoviruses (VSV
and RV) as shown in Fig. 2a. An alignment of the
proposed four catalytic domains (20) of the VHSV,
IHNV, RV, and VSV L proteins is shown in Fig. 2b.
30 and 50 Ends
Inverse complementarity of termini is a common
feature among genomes of nonsegmented negativestranded RNA viruses. It is essential to balance the
processes of transcription and replication. As shown
in Fig. 3, VHSV genome ends also exhibit this inverse
complementarity. The VHSV genome starts at its 30
terminus with the sequence GTAT which is identical
to the start sequence of the IHNV genome. It ends
with ATAC at the 50 terminus. The 30 terminal leader
sequence comprises 167 nucleotides from the start of
the genome to the start of the ®rst open reading frame.
(a)
ORF 6
The last and largest gene on the VHSV genome
encodes the viral RNA-dependent RNA polymerase.
It starts at position 5053 and ends at position 11007.
The deduced translation product consists of 1984
amino acids with a calculated molecular mass of
224 kDa. This is the ®rst complete sequence of the
VHSV L protein gene. Identity to the deduced IHNV
L protein is 60%. Alignment of the VHSV L protein
sequence to deduced L proteins of other members of
the Rhabdoviridae yields identities of 16.7% to
vesicular stomatitis virus (VSV) L protein (16) and
25% to the L protein of rabies virus (RV; 17).
Dendrogram analysis of L polymerases was generated
by multiple sequence alignment program (PileUp,
Genetics Computer Group package version 7.3.1Unix software) within the group of Mononegavirales
(12). Dendrogram analysis data indicated a close
relationship within the members of the Rhabdoviridae
family and a more distant relation to the
Paramyxoviridae and Filoviridae families and to the
Fig. 2a. Comparison of complete L protein amino acid sequences
within the rhabdovirus family. Included are VHSV, IHNV
(infectious hematopoietic necrosis virus), VSV (vesicular
stomatitis virus; 16) and RV (rabies virus; 17).
a) The dendrogram shows the phylogenetic relationship of
rhabdoviruses based on L protein sequences.
Complete Genomic Sequence
63
(b)
Fig. 2b. Alignment of rhabdoviral L polymerases. Only conserved regions containing the typical motifs A, B, C and D of L polymerases
(20) are depicted. The strictly conserved residues of L proteins of rhabdoviruses are shown in boldface. Stars indicate those amino residues
maintained in all RNA-dependent RNA polymerases. The presumably invariant residues within the L polymerase family are underlined.
Numbers represent the amino acid positions of the respective L protein. Gaps are indicated by dots.
Fig. 3. Comparison of terminal ends of the VHSV genome: a) The 30 and 50 genomic ends of VHSV are compared. Rhabdoviruses contain
a single stranded RNA genome in antimessage … ÿ † sense. The sequences shown here represent the complementary … ‡ † strand. Inverse
complementarity between the terminal ends of the VHSV genome is shown in bold type. Nucleotide numbers indicated the position on the
genomic sequence. b) The 30 terminal sequence of VHSV is compared to the respective region of the IHNV genome in message sense.
Homologous sequences are highlighted in bold type. Numbers are related to the nucleotide position of the complete respective genome.
Gaps are indicated by dots.
64
SchuÈtze et al.
Fig. 4. Comparison of conserved sequences in nontranslated regions within the VHSV genome. The consensus sequences within regions
between the genes N and P (N-P), P and M (P-M), M and G (M-G), G and NV (G-NV), NV and L (NV-L) were compared in message
sense. Polyadenylation signals are underlined and putative transcription initiation signals are shown in boldface. The determined consensus
sequence for polyadenylation and initiation is shown below.
The 50 terminal trailer sequence consists of 151
nucleotides.
Nontranslated Regions
The nontranslated regions between the different
ORFs vary in length between 75 nucleotides
between the G and NV genes, and 128 nucleotides
between the NV and L genes. An alignment of
sequences derived from nontranslated regions of the
IHNV genome is shown in Fig. 4. At the 50 end of
every sizeable ORF a conserved sequence 50 AGATWG(A)7YGGCAC(N)3TRT-30 is present. This regulatory sequence is very similar to the consensus
sequence found in nontranslated regions of IHNV
which is 50 -AGAYAS(A)7TGGCAC(N)4GTG, and to
respective sequences in other rhabdoviruses such as
RV (NTG(A)7) or VSV (TATG(A)7).
Discussion
In this report we present the complete genomic
sequence of VHSV strain F1 (isolate Fil3) after
cloning of the entire genome. The VHSV genome is
11.158 nucleotides in length which is slightly larger
than the genome of the related ®sh rhabdovirus IHNV
(3,5). The deduced genome organization is 30 -N-P-MG-NV-L-50 . Comparison of the VHSV genome and
gene products shows a high homology to those of
IHNV with identity values of deduced amino acid
sequences between 37 and 60%. Only the NV protein
exhibits a lower identity of only 23%.
Deduced amino acid sequences revealed that
VHSV and IHNV proteins are mainly neutral with
the exception of the acidic nucleoprotein and the basic
matrix protein. The phosphoprotein P is the most
hydrophilic protein (18,19). Interestingly, the isoelectric points of the VHSV …pI ˆ 9:9† or IHNV
…pI ˆ 8:4† M proteins differ signi®cantly from
those of RV …pI ˆ 4:84† or VSV …PI ˆ 4:36†. The
functional basis for these differences is unclear at
present.
Within the P gene of VSV, an additional overlapping reading frame was detected (14) encoding a
deduced protein C, which is localized in cytoplasmic
compartiments of virus infected cells. In the VHSV
and IHNV genomes, an additional second ORF
contained in the P gene is also present. The deduced
hypothetical 46 and 42 amino acid proteins also
exhibit basic properties and are arginine rich as
described for VSV. The isoelectric points of this
additional protein of VHSV …pI ˆ 11; 6† or IHNV
…pI ˆ 12; 8† are similar to the VSV …pI ˆ 11; 6†.
However, it is unclear whether these potential proteins
are expressed at all and, thus, it remains to be
determined whether they are conserved within more
members of the rhabdoviruses.
As expected, the VHSV L protein exhibits a high
degree of identity with other rhabdoviral L proteins.
The catalytic subunits, a structural characteristic of
the RNA-dependent RNA polymerases, could be
identi®ed by alignment of VHSV L protein sequences
with those of other rhabdoviral L proteins (20; Fig.
2b). The conserved domain consisting of four major
motifs A, B, C, and D is localized between amino
acids 560 and 770 of the VHSV L polymerase.
Interestingly, in both ®sh rhabdoviral L polymerases a
conserved glycine residue within domain D as found
in RV and VSV is replaced by a proline. Alterations at
the same position have also been described in other
negative-stranded RNA viruses such as Bunya- and
Complete Genomic Sequence
Arenaviridae (20). The functional signi®cance of this
change is unclear.
Comparing length and nucleotide sequence of the
termini, the VHSV and IHNV genomes also appear
very similar. The 167 nt leader region of VHSV is
52% identical to the 174 nt leader of IHNV with a
particularly high identity in the extreme 30 -ends (Fig.
3). The 50 -trailer region of VHSV is 40% identical to
the respective region of IHNV.
The determined nontranslated regions contain
signals for termination of transcription and polyadenylation of mRNA, and signals for initiation of
transcription of the downstream gene. The determined
polyadenylation (AGATWG(A)7) and transcriptional
initiation (AACA) sequences are similar to respective
signals in RV and VSV genomes (21,22). However, in
addition to the conserved core sequence additional
sequences conserved between the two ®sh rhabdoviruses IHNV and VHSV were detected downstream
from ORFs 1 to 5 (Fig. 4). The VHSV consensus
sequence YGGCAC(N)3TRT is similar to that found
in IHNV which is TGGCAC(N)4GTG. Both are
probably involved in the initiation of transcription.
Therefore, this sequence is not present downstream
from the polyadenylation signal behind ORF 6 (L
polymerase gene) but is found in 30 terminal region
upstream from ORF 1 (with a change of one
nucleotide).
With the presentation of the complete genomic
sequence of VHSV, the second ®sh rhabdovirus
genome has been completely elucidated. Knowledge
of the sequence of the viral genome and its gene
content and composition is important for further
studies on the function of viral proteins in the viral
replicative cycle in cell culture and in the animal host.
For a better molecular analysis of ®sh rhabdovirus
infections, a reverse genetics system for both viruses
is required, as has successfully been established for
mammalian negative-strand RNA viruses such as RV
and VSV (23,24). Availability of complete genomic
sequences is a major prerequisite towards this goal
which can then lead to generation of new vaccines
with improved properties for use in aquaculture.
65
References
1. Wolf K., Fish viruses and ®sh viral diseases. Cornell University
Press, Ithaca, 1988.
2. Murphy F.A., Fauquet C.M., Bishop D.H.L., Ghabrial S.A.,
Jarvis A.W., Martelli G.P., Mayo M.A., and Summers M.D.,
Arch Virol 10, 265±288, 1995.
3. SchuÈtze H., Enzmann P.-J., Kuchling R., Mundt E., Niemann
H., and Mettenleiter T.C., J Gen Virol 76, 2519±2527, 1995.
4. SchuÈtze H., Enzmann P.-J. Mundt E., and Mettenleiter T.C.,
J Gen Virol 77, 1259±1263, 1996.
5. Morzunov S.P., Winton J.R., and Nichol S.T., Virus Res 38,
175±192, 1995.
6. Kurath G., Ahern K.G., Pearson G.D., and Leong J.C., J Virol
53, 469±476, 1985.
7. Chomzynski P. and Sacchi N., Analyt Biochem 162, 156±159,
1987.
8. Glisin V., Crkvenjakov R., and Byrus C., Biochemistry 13,
2633±2637, 1974.
9. Sambrook J., Fritsch E.F., and Maniatis T., Molecular cloning:
A laboratory Manual. 2nd. edn. Cold Spring Harbor
Laboratory, New York, 1989.
10. Bernard J., Lecocq-Xhonneux F., Rossius M., Thiry M.E., and
DeKinkelin P., J Gen Virology 71, 1669±1674, 1990.
11. Thiry M., Lecocq-Xhonneux F., Dheur I., Renard A. and
DeKinkelin P., Biochim Biophys Acta 1090, 345±347,
1991.
12. Devereux J., Haeberli P. and Smithies O., Nucl Acids Res 12,
387±395, 1984.
13. Benmansour A., Paubert G., Bernard J., and DeKinkelin P.,
Virology 198, 602±612, 1994.
14. Spiropoulou C.F. and Nichol S.T., J Virol 67, 3103±3110, 1993.
15. Kozak M., Nucleic Acids Res 15, 8125±8148, 1987.
16. Schubert M., Harmison G.G. and Meier E., J Virol 51, 505±514,
1984.
17. Tordo N., Poch O., Ermine A., Keith G. and Rougeon F.,
Virology 165, 565±576, 1988.
18. Baer G.M., Bellini W.J., and Fishbein D.B., Virology. ed. by B.
N. Fields; D. M. Knipe. 2nd ed. New York: Raven Pr. 1990,
pp. 883±942.
19. Wagner R.R., Virology. ed. by B. N. Fields; D. M. Knipe, 2nd
ed. Raven Pr., New York, 1990, pp. 867±881.
20. Tordo N., de Haan P., Goldbach R., and Poch O., Sem Virol 3,
341±357, 1992.
21. Conzelmann K.-K., Cox J.H., Schneider L.G., and Thiel H.J.,
Virology 175, 485±499, 1990.
22. Rose J.K., Cell 19, 415±421, 1980.
23. Schnell M.J., Mebatsion T., and Conzelmann K.-K., The EMBO
Journal 18, 4195±4203, 1994.
24. Lawson N.D., Stillman E.A., Whitt M.A., and Rose J.K., Proc
Natl Acad Sci USA 92, 4477±4481, 1995.