Download Sequence analysis of the Marburg virus nucleoprotein gene

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

RNA wikipedia , lookup

Community fingerprinting wikipedia , lookup

RNA interference wikipedia , lookup

Protein wikipedia , lookup

Metabolism wikipedia , lookup

RNA silencing wikipedia , lookup

Messenger RNA wikipedia , lookup

Polyadenylation wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Proteolysis wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Gene wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epitranscriptome wikipedia , lookup

Homology modeling wikipedia , lookup

Protein structure prediction wikipedia , lookup

Gene expression wikipedia , lookup

Biochemistry wikipedia , lookup

Plant virus wikipedia , lookup

Point mutation wikipedia , lookup

Biosynthesis wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Genetic code wikipedia , lookup

Transcript
347
Journal of General Virology (1992), 73, 347-357. Printed in Great Britain
Sequence analysis of the Marburg virus nucleoprotein gene: comparison
to Ebola virus and other non-segmented negative-strand RNA viruses
Anthony Sanchez,l*~f Michael P. Kiley, 2 Hans-Dieter Kienk 3 and Heinz Feldmann 3
1Department of Biology and Laboratory for Microbial and Biochemical Sciences, Georgia State University, Atlanta,
Georgia 30302-4010, 2Research and Development Program, The Salk Institute, Government Services Division,
Swiftwater, Pennsylvania 18370, U.S.A. and 3Institut j~r Virologie, Philipps-Universiti~t Marburg, 3550 Marburg,
Germany
The first 3000 nucleotides from the 3' end of the
Marburg virus (MBG) genome were determined from
cDNA clones produced from genomic RNA and
mRNA. Identified in the sequence was a short putative
leader sequence at the extreme 3' end, followed by the
complete nucleoprotein (NP) gene. The 5' end of the
NP mRNA was determined as was the polyadenylation
site for the NP gene. The transcriptional start (3'
U U C U U C U U A U A A U U . . ) and termination (3'
..UAAUUCUUUUU) signals of the MBG NP gene are
very similar to those seen with Ebola virus (EBO). In
comparison to other non-segmented negative-strand
RNA viruses, filovirus transcriptional signals are most
similar to members of the Paramyxovirus and Morbillivirus genera. In vitro translation of a run-off transcript
containing the entire MBG NP coding region produced
an authentic NP. Sequence comparisons of the 3' end
of the MBG and EBO genomes revealed weak
nucleotide sequence similarity, but the predicted
sequence of the first 400 amino acids of these viruses
showed a high degree. This homology is encoded in
divergent nucleotide sequences through different codon usages and substitutions of similar amino acids. A
small region in the middle of the MBG and EBO NP
sequences was found to contain a significant amino acid
homology with NPs of paramyxoviruses and to a lesser
extent with rhabdoviruses. Specific sites of conserved
sequence are contained in hydrophobic domains and
may have a common function. Alignments of the entire
NP amino acid sequences of these viruses also suggest
that filoviruses are more closely related to paramyxoviruses than to rhabdoviruses.
Introduction
MBG, Ebola virus (EBO) and Reston virus (RES)
(Ebola-like monkey filovirus; Centers for Disease Control, 1989; Jahrling et al., 1990) are non-segmented
negative-strand (NNS) RNA viruses and are members of
the family Filoviridae. Together with the Paramyxoviridae and Rhabdoviridae, these families make up the order
Mononegavirales, a status accorded to them in 1990 by the
International Committee on Taxonomy of Viruses
(Pringle, 1991). The filovirus virion is bacilliform in
morphology and is composed of a helical nucleocapsid
surrounded by a lipid envelope. Virions contain at least
seven structural proteins and for MBG these proteins are
an RNA-dependent RNA polymerase (L protein; Mr
267K; unpublished data), a single surface glycoprotein
(GP; Mr 170K; Feldmann et al., 1991), a nucleoprotein
(NP; Mr 94K) and four proteins ranging in Mr from 24K
to 38K (Kiley et al., 1988).
The genetic features of filoviruses are similar to other
NNS viruses in that (i) transcription of the infecting
ribonucleoprotein complex, which contains a single
negative-sense genomic RNA template, yields monocistronic polyadenylated mRNA species and (ii) for EBO
Marburg virus (MBG) is a 'Biosafety Level 4' agent
(Richardson & Barkley, 1988) that was first identified in
1967 following human outbreaks of acute haemorrhagic
fever in the cities of Marburg and Frankfurt, Germany,
and Belgrade, Yugoslavia (Martini & Siegert, 1971).
Initial infections occurred in persons working with
blood, organs or cell culturing of tissues from infected
African green monkeys (Cercopithecus aethiops) imported
from Uganda. This pathogen received its name from the
city of Marburg, where most of the cases occurred and
where much of the initial work on the virus was
performed. Three subsequent human outbreaks have
been attributed to MBG (Gear et al., 1975; Smith et al.,
1982; Kiley et al., 1988).
t Present address: Special Pathogens Branch, Division of Viral and
Rickettsial Diseases, National Center for Infectious Diseases, Centers
for Disease Control, 1600 Clifton Road, Mail Stop GI4, Atlanta,
Georgia 30333, U.S.A.
The nucleotide sequence data reported in this paper have been
assigned GenBank accession number M72714.
0001-0487 © 1992 SGM
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
348
A. S a n c h e z and others
(Zaire subtype), the g e n o m e is organized such that the
N P gene is e n c o d e d at the extreme 3' e n d of the g e n o m e
a n d (iii) the E B O N P gene c o n t a i n s similar transcriptional signals t h a t delineate the genes (Kiley et al., 1986,
1988; Sanchez & Kiley, 1987; Sanchez et al., 1989). T o
define the genetic r e l a t i o n s h i p of M B G to EBO, R E S
a n d other N N S R N A viruses more fully, we have
u n d e r t a k e n a project of c l o n i n g a n d s e q u e n c i n g the
entire g e n o m e of a 1980 isolate of M B G ( M u s o k e strain).
I n this report we p r e s e n t sequence data for the first 3000
nucleotides from the 3' e n d of the g e n o m i c R N A a n d
c o m p a r i s o n s of the nucleic acid a n d predicted a m i n o
acid sequences of the M B G N P gene to EBO, p a r a m y x o viruses a n d rhabdoviruses.
Methods
Cells and viruses. Vero E6 cells were used to culture viruses as
previously described (Sanchez & Kiley, 1987). The Musoke strain of
MBG was used throughout this study and was derived from the serum
of a fatal human infection in Nairobi, Kenya in 1980 (Smith et al.,
1982). The virus was isolated and plaque-purified three times on Veto
E6 cells, and then large seed stocks were prepared in the same cell line.
For comparisons, a Zaire subtype of EBO (Mayinga strain) was used;
its passage history is described elsewhere (Sanchez et al., 1989).
Preparation of viral RNAs, molecular cloning and sequencing. Preparation of MBG genomic RNA (vRNA) and mRNA, synthesis of cDNA,
molecular cloning in pUC18, identification of virus-specific clones,
and chemical and dideoxynucleotide sequencing were performed as
previously described (Gubler & Hoffman, 1983; Sanchez et al., 1989;
Maxam & Gilbert, 1980; Sanger et al., 1977; Zimmern & Kaesberg,
1978). A synthetic oligodeoxynucleotideprimer complementary to the
first 19 bases from the 3' end of the genome (5' AGACACACAAAAACAAGAGATG) was used to generate first-strand cDNA (vRNA
template) in the production of one cDNA library, and also to probe
cDNA libraries generated from vRNA and poly(A)-tailed vRNA.
The 5" end of the MBG NP mRNA was sequenced by primer
extension, using the primer 5" CCAACAAACTGTGTAAATCCAT
(vRNA sense; bases 125 to 104), which was radiolabelled at the 5' end
with [~,-32p]ATP and used in a first-strand reaction. The extension
product was isolated from a sequencing gel and chemically sequenced
as previously described (Sanchez et al., 1989).
Agarose gel electrophoresis and Northern blot hybridization. Acidurea-agarose (1.5% w/v) gel electrophoresis, blotting and hybridizations were performed as described elsewhere (Rosen et al., 1975;
Sanchez et al., 1989), except that GeneScreen Plus (New England
Nuclear) was used as the hybridization transfer membrane and blotted
RNA was not baked onto the membrane.
Computer-aided sequence analysis. The Sequence Analysis Software
Package developed by the Genetics Computer Group (University of
Wisconsin BiotechnologyCenter; Version 7.0) was used in analysing
sequence compositions, sequence comparisons, manipulations, and
graphic output (Devereux et at., 1984).
Construction and in vitro expression of the MBG NP gene coding region.
The entire MBG NP gene open reading frame (ORF) was synthesized
by the polymerase chain reaction (PCR) technique using a commercial
kit (Perkin-Elmer Cetus). Primer pairs used in amplifying the
MBG NP coding region from vRNA template are as follows: 5' CA-
Leader
~1
Nucleoprotein gene
I
I
3,11
I
I
Start site
~5'
Poly(A) site
MV-881
]
MV-17
MV-39 /
MV-34[~
I
ii
i
i
i
i
i
0
i
i
i
1000
i
i
i
i
iq
i
;
I
;
2000
~ :
:
:
:
::
I
i
I
I
I
I
3000
Fig. 1. The 3' end of the MBG genome and cDNA clones in sequence
analysis. A schematic representation of the 3' end of the MBG genome
is shown at the top of this figure. From left to right are the putative
leader sequence, the transcriptional start site, the non-coding 3' end of
the NP gene, the NP ORF, the 5' non-coding region and the poly(A)
site. Below this drawing are shown the principal cDNA clones
(generated from vRNA) used in sequencing studies and their positions
as they align on the MBG genome. The nucleotide sequences for the
cloned inserts are as follows: MV-88, 1 to 1744; MV-17, 954 to 3593;
MV-39, 1 to 941; MV-34, 1 to 257. At the bottom is a scale as a
reference for nucleotide sequence lengths.
GGGTACCGTGTATCATATAAATAAAGAAGAATATTAAC
(mRNA sense; bases 31 to 61) and 5' CAGGGTACCGCTGCATGTATGATGAGTCCCACATTGTGA(vRNA sense; bases 2969 to
2940). These primers each contain a KpnI restriction endonuclease site
at the 5' end to facilitate cloning. First-strand DNA synthesis was
performed using vRNA and 0.3 ktg RNA sense primer (Sanchez et al.,
1989), then the products were incorporated into a PCR assay using 0.3
~tg of each primer in a 100 ktl reaction. The amplified DNA was
digested with KpnI and ligated into the KpnI site of the pGEM3Zf(+)
transcription vector (Promega). A plasmid was isolated that contained
the initiating AUG codon positioned downstream of the T7 RNA
polymerase promoter. The DNA from this construct was purified by
banding in CsC1 gradients, then uncapped run-off RNA transcripts
were produced from this DNA after it had been linearized with XbaI.
Transcription was performed using T7 RNA polymerase in a largescale transcription reaction (Promega protocol). The resulting transcript was translated in vitro and labelled with pS]methionine (New
England Nuclear) in a rabbit reticulocyte lysate system (Promega).
Translation products were immunoprecipitated with a human antiMBG serum, subjected to SDS-PAGE and processed for fluorography
as previously described (Sanchez & Kiley, 1987).
Results
Cloning and sequence analysis o f the 3" end o f the M B G
genome
A schematic r e p r e s e n t a t i o n of the p r i m a r y clones used i n
g e n e r a t i n g the Y e n d sequence d a t a for the M B G
g e n o m e is s h o w n i n Fig. 1. T h e sequences o b t a i n e d were
d o u b l e - s t r a n d d a t a o b t a i n e d by either c h e m i c a l or
d i d e o x y n u c l e o t i d e s e q u e n c i n g of cloned inserts. T h e
clones MV-88 a n d MV-39 c o n t a i n the exact 3' e n d of the
g e n o m e a n d were generated by p r i m i n g first-strand
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
Marburg virus NP gene
349
(c)
T
G
+
+
CCAG
(a)
1
N
N
C
T
I
C
T
T
A
T
28S,~1~
.
A
A
T
T
18S,~
G
T
A
A
Fig. 2. Specificity of cDNA clones, MBG mRNAs and sequencing of the 5' end of the MBG NP mRNA. (a) Northern blot
hybridization of 32P-labelled probes (nick translation) generated from eDNA clones MV-88 and MV-17 (see Fig. 1) to lanes of RNA
resolved by electrophoresisin an acid-urea-agarose (1-5~) gel. Lanes include preparations of purified vRNA (lanes 1, 4 and 7), a crude
preparation of MBG mRNA (lanes 2, 5 and 8) and uninfected Vero E6 total cell RNA (lanes 3, 6 and 9). Lanes 1 to 3 show a set of lanes
that were stained with ethidium bromide prior to blotting. Hybridization was performed under stringent conditions (50% formamide;
42 °C) overnight. The locations of the 28S and 18S ribosomal RNA bands are identified at the left edge of the figure. (b) Fluorographyof
an acid-urea-agarose gel containing lanes of [3H]uridine-labelled RNA from Vero E6 cells infected with either MBG (lane 1) or EBO
(lane 2) (treated with actinomycin D prior to labelling). Lanes were aligned with Northern blots and an asterisk identifiesthe position of
the MBG NP mRNA. (c) Autoradiograph of a 6% sequencing gel that shows the sequence for the 5' end of the MBG NP mRNA. An
mRNA-complementaryprimer, labelled at the 5' end with [y-32p]ATP,was annealed close to the 5' end of the genome, extended with
reverse transcriptase, and the extension products were chemically sequenced (Maxam & Gilbert, 1980).
c D N A synthesis with a 3' complementary synthetic
oligonucleotide plus random priming, using v R N A as
template. Clone MV-34 contains the 3' end plus a
poly(A) tail added to v R N A with poly(A) polymerase
(prior to first-strand c D N A synthesis) and primed with
oligo(dT). In addition to sequencing of cloned inserts, the
sequence from nucleotides 1650 to 3000 was verified
through direct dideoxynucleotide sequencing of purified
vRNA.
Hybridization of clones MV-88 and MV-17 to Northern blots of M B G v R N A and m R N A transcripts
demonstrated their specificity for M B G sequences and
identified the transcripts recognized by these clones (Fig.
2a). Clone MV-88 hybridizes to a single transcript,
which corresponds to a large M B G m R N A species seen
in Fig. 2(b), indicated by an asterisk, and is comparable
in size to the EBO N P m R N A . As shown in Fig. 1, the
MV-17 clone contains sequences that overlap the first
and second genes, and hybridization analysis showed
that this clone anneals to the N P transcript and to a
second m R N A species that is transcribed from the next
(adjacent) gene. The agarose electrophoresis pattern of
oligo(dT)-selected M B G m R N A transcripts is very
similar to that seen for EBO (Fig. 2b) and also shows
them to be polyadenylated and monocistronic.
The 5' end of the N P m R N A was sequenced by primer
extension and chemically sequencing the extension
product, the results of which are shown in Fig. 2 (c). The
last band of this sequencing ladder is weaker than the
next smaller band and m a y be due to (i) variability in the
exact site in which transcription of the N P m R N A
occurs, (ii) copying of a 5' cap structure by reverse
transcriptase, (iii) premature termination of the extension product caused by a cap structure or (iv) an artefact
of the sequencing chemistry. In any event, from the
results obtained from 5' end sequence analysis, the
transcriptional start site shown in Fig. 3 represents the
longest possible transcript.
Fig. 3 shows the viral complementary sequence
( m R N A sense) of the M B G genome, including the entire
N P gene. The transcriptional start site and stop or
poly(A) site are identified on the sequence and delineate
the N P gene. The start site was determined by
sequencing the 5' end of the N P m R N A , and the poly(A)
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
A. Sanchez and others
350
Leader ...... >
5' A G A C A C A C A A A A A C A A G A G A U G A U G A U U U U G U G U A U C A U A U A A A U A
Start S i t e - - - >
AAGAAGAAUAUUAACAUUGACAUUGAGACUUGUCAGUCGUGUAAUAUUCUUGAAGAU
103
CCC A C U GCC C C U C A U G U C C G U A A U A A G A A A G U G A U A U U A U U U G A C A C A A A U C A U
Pro T h r A l a Pro His Val A r g A s n Lys Lys Val Ile L e u Phe A s p T h r A s h His
193
CAG GUU AGU AUC UGU AAU CAG AUA AUA GAU GCA AUA AAC UCA GGG AUU GAU CUU GGA GAU CUC CUA GAA GGG GGU UUG CUC ACG UUG UGU
31 G l n V a l Ser !!e C y s A S h G i n lle lle A s p A l a Ile A s n Ser G l y Ile A s p Leu Gly A s p L e U L e u G l u G l y G l y L e u L e u T h r L e u Cys
283
G U U G A G C A U U A C U A U A A U U C U G A U A A G G A U A A A U U C A A C A C A A G U C C U G U C G C G A A G U A C U U A C G U G A U G C G G G C U A U G A A U U U G A U GUC
61 V a l G l u His T y r T y r A s n S e r A s p L y s A s p L y s P h e A s n T h r S e r Pro Val A l a Lys T y r L e u A r g A s p A l a G l y T y r G l u Phe A s p V a l
373
A U C A A G A A U G C A G A U G C A A C C C G C U U U C U G G A U GUG A G U C C U A A U
91 Ile Lys A s n A l a A s p A l a T h r A r g P h e Leu A s p Val Ser Pro A s n
463
AUG GAU UUA CAC AGU UUG UUG GAG UUG GGU ACA AAA
1 Met A s p L e u His S e r L e u L e u G l u Leu G l y T h r Lys
G A A C C U C A U UAC A G C C C U U U A A U U C U A G C C C U U A A G A C A U U G G A A
Glu Pro His T y r S e r Pro L e u Ile L e u A l a L e u L y s T h r L e u G l u
GGA GAC CGA GCU AGU
G l y A s p A r g A l a Ser
553
CUU ACC ACA GGC CAC AUG AAA
L e u T h E T h r G l y His M e t Lys
643
GUA AUU UUC GGG AUU UUG AGG UCC AGC UUC AUU UUA AAG UUU GUG UUG AUU CAU CAA GGA GUA AAU
181 Val Ile Phe G l y Ile Leu A r g Ser Ser Phe Ile Leu Lys Phe Val L e u Ile His Gln G l y V a l A S h
UUG GUG ACA GGU CAU GAU GCC UAU
L e u V a l T h r G l y His A s p A l a T y r
733
GAC AGU AUC AUU AGU AAU
211 A s p S e t Ile Ile S e r A s n
UCA GUA GGU CAA ACU AGA UUC UCA GGA CUU CUU AUC GUG AAA ACA GUU
S e r V a l G l y G l n T h r A r g P h e Ser G l y L e u Leu Ile V a l Lys T h r Val
CUC GAG UUC AUC UUG CAA AAA ACU
L e u G l u Phe Ile L e u G l n L y s T h r
823
GAU UCA GGG GUG ACA CUA
241 A s p S e r G l y V a l T h r Leu
CAU CCU UUG GUG CGG ACC UCC AAA GUA AAA AAU
His P r o L e u V a l A r g T h r Ser Lys Val Lys A s n
121
AGU ACU GAA UCU CAG AGG
Ser T h r G l u Ser G l n A r g
151
A U C G A A A A G G C U U U A A G A C A A G U A A C A G U G CAU
Ile G l u Lys A l a L e u A r g G l n Val T h r V a l His
GGG AGA AUU GGG CUC UUU UUA UCA UUU UGC AGU CUU UUC CUC CCAAAA
G l y A r g Ile G l y Leu Phe L e u Ser Phe Cya Ser Leu Phe Leu Pro Lys
CAA GAA CAG GGG AUC GUC ACA UAC CCU AAU
G l n G l u G l n G l y Ile Val T h r T y r Pro A s h
CGA CAU GGG GAA UAC GCA CCA UUU GCA CGG GUU CUG AAU
271 A r g His G l y G l u T y r A l a P r o Phe A l a A r g Val L e u A s n
301
CUU GUC GUC
L e u Val V a l
CAU UGG
His T r p
G A A G U U GCU A G U U U C A A G
Glu Val A l a Set Phe Lys
UUA UCA GGG AUU AAC AAC
L e u Set G l y Ile A s h A s n
CUC GAA CAU
L e u G l u His
CAG GCG UUG AGC AAC
Gln Ala Leu Ser Asn
G G A CUC U A U C C U C A G
Gly Leu T y r P r o G l n
C A A A G G C G A C A U G A A C A U C A G G A A A U U C A A GCU A U U GCC G A G G A U GAC
G l n A r g A r g H i s G l u His G l n G l u Ile G i n Ala Ile A l a G l u A s p A s p
C U U U C A G C A 1003
Leu Ser Ala
CUA CGA GAG GCG GCA CAU
L e u A r g G l u A l a A l a His
1093
GAG GAA AGG AAG AUA UUA GAA CAA
Glu G l u A r g Lys Ile L e u G I u G l n
1183
A U U G C G C U G G G U G U G G C A A C A G C A C A C GGC A G U A C A U U G G C U G G U GUC A A U G U U GGC G A A C A A U A U C A A C A A
Ile A l a L e u G l y V a l A l a T h r A l a His G l y Ser T h r Leu A l a G l y Val A s n Val G l y G l u G l n T y r G l n G l n
GAU GCG GAA GUA AAA CUA
331 A s p A l a G l u Val Lys Leu
913
CUA GCC
Leu Ala
U U C C A C C U U C A G A A A A C U G A A A U C A C A C A C A G U C A G A C A C U A G C C G U C C U C A G C C A G A A A C G A G A A A A A U U A G C U C G U C U C G C U G C A G A A 1273
361 P h e His L e u G l n L y s T h r G l u lle T h r H i s Set G l n T h r L e u A l a V a l L e u Set Gln Lys A r g G l u Lys Leu Ala A r g Leu A l a A l a Glu
391
AUU G A A A A C A A U
Ile G l u A s n A s n
AUU GUG
lle V a l
G A A G A U C A G G G A U U U A A G C A A U C A C A G A A U C G G GUG U C A C A G U C G U U U
G l u A s p G l n G l y Phe Lys G l n Ser G l n A s h A r g Val Ser G l n S e t Phe
U U G A A U GAC
Leu A s n A s p
C C U A C A C C U G U G G A A 2363
Pro T h r Pro V a l Glu
CAU GAA UCU ACU GAA GAU AGC
His G l u S e r T h r G l u A s p Ser
1453
U C U U C U U C A A G U A G C U U U G U U G A C U U G A A U G A U C C A U U U G C A C U G C U G A A U GAG GAC G A G G A U A C U C U U G A U G A C A G U G U C A U G A U C C C G
451 S e r S e r S e r Ser S e r Phe Val A s p L e u A s n A s p P r o Phe A l a L e u Leu A s n G l u A s p G l u A s p T h r L e u A s p A s p S e r V a l M e t Ile Pro
1543
GGC A C A A C A U C G A G A G A A U U U C A A G G G A U U C C U G A A C C G C C A A G A C A A U C C C A A GAC
481 G l y T h r T h r S e r A r g G l u Phe G l n G l y Ile Pro G l u Pro Pro A r g G l n Ser G l n A s p
1633
GUA ACG GUU
421 Val T h r V a l
511
CAA GCC AGG
Gln Ala Arg
U C C A C A A A U C G G A U U A A G A A A C A G U U U C U G A G A U A U C A A G A A U U G CCU
Set T h r A s n A r g Ile Lye Lys G l n P h e Leu A r g T y r G l n G l u L e u Pro
CAA GAA AGC AUC GAC CAA
541 G l n G l u S e t Ile A s p G l n
CCA AUA CAG CAC
571 P r o Ile G l n His
601
C C C A U G A A U C G A C C A A C U G C U C U G C C U C C C C C A G U U GAC GAC A A G A U U GAG
Pro M e t A s n A r g Pro T h r A l a Leu Pro Pro Pro Val A s p A s p Lys Ile G l u
CCA GGA UCC GAC AAU
Pro Gly S e r A s p A s n
CCA GCA GCA AAC
Pro A l a A l a A s n
CUC A A U A A C A G C
L e u A s h A s h Ser
CCU G U U C A A G A G GAU GAU
Pro Val Gln G l u A s p A s p
GAA UCG
Glu Ser
CAG GAA GAU GAA
Gln Glu Asp Glu
G A A U A C A C A A C U G A C U C U 1723
G l u T y r T h r T h r A s p Set
G A A C A A G G A G U U G A U C U U C C A C C U C C U C C G U U G U A C G C U C A G G A A A A A A G A C A G GAC
Glu Gln G l y Val A s p Leu Pro Pro Pro Pro L e u T y r A l a G l n G l u Lys A r g G l n A s p
1813
C C U CAG G A U CCC U U C G G C A G U A U U G G U G A U G U A A A U G G U G A U A U C U U A G A A C C U A U A A G A U C A C C U
P r o G l n A s p Pro P h e G l y Ser Ile Gly A s p Val A s n G l y A s p Ile L e u G l u P r o Ile A r g Ser Pro
1903
UCU UCA CCA UCU GCU CCU CAG GAA GAC ACA AGG AUG AGG
Ser S e r P r o Ser A l a Pro G l n G l u A s p T h r A r g Met A r g
G A A GCC U A U G A A U U G U C G C C U G A U U U C A C A A A U G A U G A G G A U A A U
G l u A l a T y r G l u L e u Ser Pro A s p Phe T h r A s n A s p G l u A s p A s n
A A U U G G C C A C A A A G A G U G G U G A C A A A G A A G G G U A G A A C U U U C C U U U A U C C U A A U GAU C U U
631 A s n T r p Pro G l n A r g V a l Val T h r Lys L y s G l y A r g T h r Phe Leu T y r Pro A s n A s p Leu
A C A GCC
661 T h r A l a
CAA GGA AAG
G l n G l y Lys
CUC GUU GAG GAA UAC CAA AAU
Leu Val Glu GIu Tyr Gln Asn
CCU GUC UCA GCU AAG
P r o Val Ser A l a Lys
CUG CAA ACA AAU
Leu Gln Thr Asn
GAG CUU C A A G C A G A U U G G C C C G A C A U G
G l u Leu G l n A l a A s p T r p Pro A s p M e t
C A G CAG
G l n Gln
C C U C C A G A G U C A C U U A U A 2083
P r o P r o G l u Set L e u Ile
U C A U U U GAU
Ser Phe A s p
GAA AGG AGA CAU
G l u A r g A r g His
GUU GCG AUG AAC UUG UAG UCCAGAUAACACAGCACGGUUACUCACUUAUCUAUUUUGAUA•GACUCAUCCUCAGAUCACAGCAAUCAAAUUUAUU•GAAUAUUUGAACCACCU
691 Val A l a M e t A s n L e u T E R
UUUAG UAUC CUAUUACUUG U UAC UAUU GUGUGAGACAACAUAAGC
UUUUUUCAAUUGCUAUAAUUAUACAAC
2173
2286
C A U C A A A U A A C A A U C A CG G G C A A G G A C U G G G C A U A C U A U G G U GG UC U U A G A G C A U U G U C C A G U G C U A C A A A U U C
2405
U A C A A A C C UC C A U A C A U U U G C C G C A A C A C U G U A A U C A A C A CU GC UG U A U C U C U U C U U C A A G C C A U C U G A U U U A A C U U A A U A A A C A U G A C U U G
2524
AUU CAAAGAAUAUAC U GACAAUGUUACUGUUUGAAUUUC
GUAUUCUUUAUAAAUCACU
1993
U C A A G U G G U G C A C U A U CC UAC U GU UUUG CU C A G C U U A G U A U A U U G U A A U A U G U A A G U G G A C U CU CC CC U U CU CC U C U C GO 2643
UACUUGAUAGAAUGUC GAGUCUACUGGUUUGGAGUUUCCU
UACUCUAAUGGAUGUAAUAAUUAACUGUUGGCCUAGAUGAUAACAGAUAU
GAGGUUAUAU
2762
P o l y (A) S i t e
AAUUA••CAUAG•GUAAAGUAUAAUUCUUACCUCUG•UUCUUCUGUUUUCCCUUUCUUUUAUAAUAUGCCAAUUAAGAAAAA•UAAAAAUCGAAGAAUAUUAAAGAu•U••U•UAAUA•
2881
U CAGAAAAGGCUUUUUAUU
3000
CUAUUCUUUCUUUU UACAAACGUAUUGAAAUAGUAAUU
CU C A C A A U GU GG G A C U C A U C A U A C A U G C A G C A A G U C A G C G A A G G G U U G A U G A C U G G A A A A G U
Fig. 3. Viral complementary sequence (mRNA sense) of the 3' end of the MBG genome. Identified in the figure are the putative leader
sequence, the transcriptional start site, the NP coding region and the poly(A) site. The exact start of the gene (5' end of mRNA) has not
been determined, but primer extension using MBG mRNA as a template and chemical sequencing of the extension product have
identified it to within one base. Thus the maximum length of the putative leader sequence and the beginning of the NP gene in this
figure are preliminary and have not been confirmed. As shown, the gene is 2798 nucleotides in length and encodes a protein that
contains 695 amino acids.
site was identified by the isolation and sequencing o f
three m R N A clones containing the poly(A) tail region o f
the N P transcript (data not shown).
The M B G N P O R F encodes a protein that is 695
amino acids in length and has a predicted Mr o f 77"86K.
The O R F is initiated by the first A U G from the 5' end o f
the m R N A and is flanked by sequences that place it in a
favourable context (Kozak, 1986). The N P contains a
large amount o f negatively charged amino acids,
resulting in a highly acidic protein (net charge - 28). A s
shown with the EBO N P , the majority o f the acidic
amino acids and proline residues are found in the C-
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
Marburg virus NP gene
3 z ° 0I .r1 •r a.r• Ll
-10 .
.
an
.
.
.
.
.
.
.~ I I
.
100
200
300
100
I '
200
300
'-I-I-I
I
l
t
.
s I/
~
400
500
400
' .~
500
600
700
o
~z
~v
3200(~)1
100
200
300
400
Amino acid sequence number
Fig. 4. Hydropathy plots of the predicted amino acid sequencesof NP
proteins. Hydropathicprofilesof the NP proteinsof MBG (a), EBO (b),
SEN (c) and VSV (d) weregeneratedby the method of Kyte & Doolittle
(1982) using a window size of seven residues. Solid bars above the
hydropathy plots identify correspondingregions of which the amino
acid sequences are shown aligned in Fig. 6. Asterisks identify a
prominent hydrophobic peak that is seen in identical positions with
respect to the sequencesmarked by the bars for MBG, EBO and SEN.
terminal half of the MBG N P (68~o and 7 5 ~ , respectively, between residues 348 and 695), and all three cysteine
residues are found in the N-terminal third of the NP. The
M B G N P is also similar to the EBO N P in its hydropathy
plot, seen in Fig. 4, which shows that the N P can be
divided into a hydrophobic N-terminal half and a
hydrophilic C-terminal half. For comparison, the hydropathy plots of the N P proteins of Sendai virus (SEN) and
vesicular stomatitis (VSV) are also shown in Fig. 4, and it
appears that the SEN N P profile is closer to those of the
M B G and EBO NPs than that of the VSV NP. This
similarity in hydropathic profiles is particularly prominent in the hydrophobic sequences from residue 130 to
approximately 320 (160 to 350 for SEN).
Sequence comparisons of the MBG and EBO NP genes
Computer-aided matrix (dot plot) comparisons of the
nucleotide and amino acid sequence homologies between
the MBG and EBO N P genes are shown in Fig. 5(a).
Some scattered similarity is seen in the nucleotide
sequences, corresponding to bases 450 to 1100 for M B G
and 950 to 1600 for EBO. In contrast, matrix comparison
of amino acid sequences reveals a close resemblance
between these two proteins in the N-terminal 400
351
(approximate) residues, with only a small break around
residues 120 to 140. An alignment of the N P amino acid
sequences of MBG and EBO is shown in Fig. 5 (c). The
alignment shows that the region from positions 130 to
392 of the MBG sequence has very strong similarity, and
is highlighted by a run of 34 identical amino acids (MBG
sequence 296 to 329). The strongest identity seen in the
nucleic acid matrix comparison corresponds to the
strongest regions of identity seen in the amino acid
alignment. It should also be noted that two of three
cysteines in this alignment are conserved and are nearest
to the N terminus.
Comparisons of the NP amino acid sequences of
filoviruses and other NNS RNA viruses
Comparisons of the predicted N P amino acid sequences
of filoviruses and other N N S R N A viruses were
performed to determine whether any conserved regions
were present. These analyses revealed that in both the
MBG and EBO NPs there is a short region which has a
significant degree of identity with paramyxovirus NPs,
and to a lesser extent with those of rhabdoviruses. This
sequence in the M B G and EBO N P corresponds to the
highly conserved region in the central part of the protein
described above. These findings are illustrated in Fig.
6 (a) and (b), which show matrix comparisons of the N P
amino acid sequences of MBG and SEN, and M B G and
VSV, respectively. The MBG amino acid sequence
around the short region of similarity seen in the centre of
the M B G / S E N matrix plot (see arrow, Fig. 6a) was
aligned to the N P amino acid sequences of six
paramyxoviruses and two rhabdoviruses (Fig. 6c). This
alignment was initially computer-generated, and then a
manual alignment was made to minimize the introduction of gaps. A consensus sequence was derived and
asterisks beneath the consensus line mark the locations
where both MBG and EBO sequences are either
identical or have amino acids similar to those of the
consensus sequence. At these locations M B G and EBO
sequences occured in 79 ~ of the consensus positions and
showed a significantly greater likeness to paramyxoviruses than to VSV or rabies virus. This region of similarity contains the sequences previously identified by
Elango (1989) as highly conserved within the family
Paramyxoviridae (underlined positions in line with
asterisks in Fig. 6c).
Fig. 7 shows a set of computer-generated dendrograms
which schematically show the relatedness of the amino
acid sequences of the entire and conserved region of the
NPs of the viruses seen in Fig. 6(c). The homology of
these viral N P sequences can be seen in the clustering of
sequences and can be measured by the lengths of the
branches in the horizontal axis, which is proportional to
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
352
A. Sanchez and others
(a)
Nucleotides
o
I000
'
2000
. . . .
'
. . . .
-3000
- 2000
Z
(c)
1
..................
MDLHSLLELGTKFTA~VRNKKV~LFDTN~QVS~CNQIZDAINSGIDLGDLLEGGLLTLCVEHYYNSDKDKFNTSPVAKYLR82
ii.i.:i-
/
i ....
ii-:
:
:::-i:
.i[,
iJ:i::.l:1:-:
::
ir
iI:.i
i.:i
.
i .i...ili
1 MD•RPQ•IWMAP•LTESDMDYHKILTAGL•vQQGI•RQRVIPVYQ•NNLEEI•QLIIQAFEAGV•FQESADSFLLMLCLHHAYQGDYKLFLESGAVKYLE
1000
-
83
:
i:
]:I
I..:..I:
::
I.
'
'
i
. . . .
i
. . . .
-0
•
.
:
.1
.:,--I.
:.
I ]lii.i]lilliIli::i::
i.
ii:
il,iil::
182
I[.
I ..Ill.If
101
GHGFRFEVKKRDGVKRLEELLPAV••GKNIKRTLAAM•EEETTEANAGQFLSFASLFLPKLVVGEKACLRKVQRQIQVHAEQGLIQYPTAWQsVGHMMVI
183
FGILR••FILKFvLIHQGVNL•TGHDAYDSII•NSvGQTRF•GLLIVKTVLEFILQKTDSGVTLHPLVRTSK•KNE•ASFKQALSNLARHGEYAPFAR•L
201
FRLMRTNFLI~FLLIHQGMHMVAG~DANDA~I~N~AQARFSGLLIVKT~L~HILQKTERGVRLHPLARTAKVKNE~NSL~AALSSLAKHGEYAPFARLL
283
NLSGINNLEHGLYPQLSAIALG~ATA~GSTLAGVN~GEQYQQLREAAHDAEVKLQRRHEHQEIQAIAEDDEERKILEQFHLQKTEITHSQTLA~L~QKRE
301
NL•GVNNLEHGLFPQLSAIALG•ATAHGSTLAGVNVGEQYQQLREAATEAEKQLQQYAESRELDHLGLDDQEKKILMNFHQKKNEISFQQTNAMVTLRKE
400
383
KLARLAAEIENNI~EDQG.~FKQSQNRvSQSFLNDPTPVEVTVQARPMNRETALPP?VDDKIEHE$TEDSSSSSSFV~LND~FALLNEDEDTLDDSVMIP
480
401
~LAKLTEAITAASLPKTSG~YDDDDDI~FPGPINDDDNPGHQDDD~TDSQD~TIPDVV~D~DDG~GEYQ~YSENGMNAPDDLVLFDLDEDD-EDTKPVP
499
481
GTTSREFQGIPEPPRQSQDLNN~QGKQEDE~TNRIKKQFLRYQELP~VQEDDE$EYTTD$Q..ESIDQPGSDNEQGVDLPPPPLYAQEKRQD~IQHPAAN
578
500
• ..::
I ....
i
: ...]::.
:: ... :.
..i..
:
:i.. .F . . . . .
[ : : . : . . :::
i ...
:..
NRSTKGGQQKNSQKGQHIEGRQTQSR~IQN~PGPHRTI~HASA~LTDNDRRNEpSGSTSPRMLTPZNEEAD~LDDADDETSSL~LESDDEEQDRDGTSN
579
pQD•FGSIGDVNGDILEPIRSPSSPSAPQEDTRMREAYELSPDFTNDEDNQQ•WPQR\rqTKKGRTFLYPNDLLQTN••ESLITALVEEYQNPVSA•ELQA
600
. . . . :. :.I
1
I
: .......
i::.:
.li : ,.I I . . i . .
::
.::: ..I.
: .:.I
: i. ..ii
RTPTVAPPA~YRDHSE..KKELPQDEQQDQDHTQE~RNQDSDNTQSEHSFEEMYRHILRSQGPFDAVLYYHMMKDEFwFSTSDGKEYTY~D~LEEEYP
679
DW...PDMSFDERRHVAMNL
698
PWLTEKEAMNEENRFVTLDGQQFYWPVMNHKNKFMAILQHHQ
r ::[..l::li:llill:::l.illl
,
100
DAGYEFDVIKNADATRFLDVSFNEPHYS~LILALKTLE$TE~QRGRIGLFLSFCSLFLPKLwGDp`ASIEKALRQVT~QEQGIvTY~NHWLTTGHMKVI
F.:Ill[l:i.lilliiiliilr:
ilIil:.li
200
282
[ill~ll.iilill.i:l.lli.il:iilrillii:i
300
MBG N P
lllr:iillill:lllilll]llilIllil[]lillilIilillili
(b)
Amino acids
0
E
200
.
.
.
.
.
.
400
.
i
600
. . . .
" i/
I
,
i
/
O
i 600
400
%
- 200
....
I ....
E ....
.if.
i .I::
::
li:l:ili
.li
.l.il.
382
II
i::.
::J
,
/
e.,
Z
:il
,I
.:
:I-I
......................
:: . . . . . .
i
599
678
i ] .i
:
697
695
i.::
739
I ' --0
MBG N P
Fig. 5. Computer-aided comparisons of the filovirus nucleotide and amino acid sequences. The Compare, DotPlot and GAP programs
of the GCG Sequence Analysis Software Package (Devereux et al., 1984) were used in analyses and graphic output. (a) Matrix
comparisons of vcRNA sequences of the 3' ends of the MBG and EBO genomes. Sequences start at the extreme 3' end of the genome
and proceed to the ends of the respective NP genes. (b) Matrix comparisons of the predicted amino acid sequences encoded in the MBG
and EBO NP genes. For the nucleotide sequence comparison a window setting of 21 and a stringency setting of 15 were used with the
Compare program. For the amino acid sequence comparison a window setting of 30 and a stringency setting of 15 were used. (c)
Alignment of the predicted amino acid sequences for the NP proteins of MBG (top) and EBO (bottom) using the method of Needleman
& Wunsch (1970). For amino acid comparisons a gap weight setting of 3.0 and a length weight setting of 0.5 were used with the GAP
program and the NWSGapPep. Cmp comparison file. Similarities between the sequences are identified with a vertical bar ([) for
identity, two dots (:) for a comparison value greater than or equal to 0-5 and one dot (.) for a comparison value greater than or equal to
0.1.
the difference between sequences. These dendrograms
are comparable in that the same viruses segregate into
the same positions in the alignments. They differ in that
the conserved region alignment shows a greater similarity (shorter horizontal lengths) than the entire NP
sequence. As expected, rhabdoviruses, paramyxoviruses
and filoviruses segregated into family groups, with the
sole exception of respiratory syncytial virus (RSV) which
surprisingly split into the filovirus branch of each
dendrogram and may indicate a closer relationship to
these viruses.
In vitro expression of the MBG NP
Fig. 8 shows the SDS-PAGE of immunoprecipitated in
vitro translation products compared with purified MBG
virion proteins. Lanes 1 and 3 contain translation
products; their synthesis was primed by a T7 R N A
polymerase-generated run-off transcript containing the
MBG NP coding region. Comparison of these products
with MBG virion proteins (lane 2) and MBG mRNAprimed and uninfected cell RNA-primed translation
products (lanes 4 and 5, respectively) demonstrates that
the most prominent translation product comigrates with
the MBG NP seen in virions and produced from m R N A
translation. These data confirm that the ORF in Fig. 3
encodes the MBG NP, and in vitro expression of this
region results in the synthesis of an authentic NP. In this
in vitro system the NP transcript also primes the synthesis
of many smaller proteins (which we have also seen with a
similar EBO NP transcript) and may arise from the
internal initiation of translation.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
Marburg virus N P gene
(b)
(a)
o
200
,
,
I
400
. . . .
I
600
.
.
.
.
353
0
,
,
I
400
200
. . . .
I
600
. . . . . . . .
- 400
400
~/
Z
Z
Z
.
>
•
ud
- 200
-
/,
""
>
200
_
/
,i-o
MBG
.
.
.
.
.
I
. . . .
I
.
.
.
.
!
'
'
NP
0
MBG NP
(c)
MBG
E•O
SEN
HP3
MUM
NDV
MEA
RSV
RAB
VSV
214
232
247
246
247
245
247
235
223
212
isnsvgqtrF
isnsvaqArF
itt leknXqI
IttieknIql
snryyamVgd
tstyynlVgd
kpriaemIcd
ggsrvegl fA
airvgt vVtA
pirygtiVsr
+ +
sgllIvktvL
sqllIvktvL
VgnyIrdagL
VgnylrdagL
IgkyIensgL
VdsylrntgL
IdtyIveagL
glfmnaygag
yedcsglvsF
FkdcAalatF
+
I
L
efILqktdsG
dhXLqkterG
asFMntikyG
asFFntiryG
taFFltlkya
taFFltlkyG
asFIltikfG
qvMLrwgvla
tgFIkqInlt
ghLskvsgls
F+
G
VtlH.pLVrt
VrlH.pLArt
VetKmaALtl
IetRmaALsl
LgtKwspLsl
IntKtsALal
IetmypALgl
ksvKniMLgh
AreailyFfb
IedlttwVln
+
+ +L
skVknEVasF
akVknEVnsL
snLrpDInkL
stLrpDInrL
aaFtgELtkL
ssLsgDIqkM
heFagELstL
asVqaEMeqV
knFeeEIrrM
reVadELcqM
+ E+
+
KqALsnlarh
KaALsslakh
RsLIdtylsk
KaLMelylsk
RsLMmlyrdi
KqLMrlyrmk
esLMnlyqqm
veVyeyaqkl
fepgqetavp
mypgqeidka
+ ++
GeyAPFArVL
GeyAPFArLL
GprAPFIcIL
GprAPFIcIL
GeqArMLaLL
GdnAPYMtLL
GkpAPYMvnL
GgeAgFyhIL
hsyFihFrsL
dsyMPYMidF
G AP++ +L
283
301
317
316
317
315
317
305
293
282
nlSginnLeh
nlSgvnnLeh
kdpvhgeFap
rdpihgeFap
eapqimdFap
gdSdqmsFap
enSiqnkFsa
nnpkaslL$1
glSgkspyss
qlSqkspyss
GIYPqLsaiA
GIFPqLsaiA
GnYPaLwSyA
GnYPaIwSyA
GgYPIIfSyA
aeYaqLySfA
GsYPILwSyA
tqFPhFsSvV
navghVfnlI
vknPaFhfwg
LGVAtAhgst
LGVAtAhgst
MGVAVVqnka
MGVAVVqnra
MGVqsVLdvq
MGMAsVLdkg
MGVgVeLens
LGnAAgLgim
hfVgCyMgqv
qiAALLLrst
LagvnvgeqY
LagvnvgeqY
MqqyvtgRtY
MqqyvtgRsY
MrnytyaRpF
tgkyqfaRdF
MgglnfgRsY
geyrgtpRnq
rslnatviaa
raknarqpdd
GYP
+GVA+++
+
qqlreAahda
qqlreAatea
LdmemFiLgq
LdidmFqLgq
LngyyYqXgv
MstsfwrLgv
FdpayFrLgq
dlydaAkAya
CapheMsVlg
IeytsLtCas
+
+ +
Evklqrrheh
Ekqlqqyaes
avAkdaeski
avApdaeaqm
EtArrqqgtv
EyAqaqgssi
EmVrrsagkv
EqLkengvin
gyLgeeffgk
llLsfavgss
E +
qelqaiaedd
reLdhlgLdd
ssALedeLgv
sstLedeLgv
dnrVaddLgl
nedMaaeLkl
sstLaseLgi
ysVLdltAee
gtFerrfFrd
adIeqqfyig
++
L
Cons.
MBG
EBO
SEN
HP3
MUM
NDV
MEA
RSV
RAB
VSV
Cons.
s
+
+
S
A
R
+
Fig. 6. Computer-aided comparisons of amino acid sequences of selected N N S RNA viruses. (a) Matrix comparison of the MBG and
SEN NPs. (b) Matrix comparison of the MBG and VSV NPs. The parameters used in matrix comparisons of amino acids are the same
as those used in Fig. 5. (c) Alignment of amino acid sequences from the NP regions of filoviruses and other NNS RNA viruses that
correspond to the short region of identity seen between MBG and SEN in (a). Alignment was such that gaps were minimized, with only
one space introduced in the MBG and EBO NP sequence [seen as a dot (.)]. Bold capital letters indicate positions where six identical or
similar residues occur. Similar amino acids: charged, (D + E), (R + K + H); phenyl group (F + Y); hydrophobic, (A + C +
F + I + L + M + V). Hydrophobic amino acids were identified by values assigned in the study by Kyte & Doolittle (1982). A consensus
sequence was derived from positions that contained at least six identical amino acids (amino acid letter) or six similar amino acids (+).
Asterisks under the consensus line indicate positions where both MBG and EBO contribute identical and/or similar amino acids to the
consensus (Cons.). Sources of sequence data: SEN (Morgan et al., 1984); HP3, human parainfluenza type 3 virus (Galinski et al., 1986);
MUM, mumps virus (Elango, 1989); NDV, Newcastle disease virus (Ishida et al., 1986); MEA, measles virus (Rozenblatt et al., 1985);
RSV (Collins et aL, 1985); RAB, rabies virus (Tordo et at., 1986); VSV (Banerjee et al., 1984).
Discussion
The results obtained from sequence data and expression
studies show that, as for most NNS RNA viruses, the
MBG NP gene is positioned at the extreme 3' end of the
genome and is preceded by a short leader RNA
sequence. The putative leader sequences for MBG and
EBO have yet to be demonstrated, but we assume that
one is synthesized during transcription as described for
related viruses (Giorgi et al., 1983; Kurilla et al., 1985;
Vidal & Kolakofsky, 1989). The structure of the MBG
NP gene conforms to that of other NNS RNA viruses in
that it is delineated by transcriptional signals that act to
initiate transcription at the 3' end and terminate
transcription [with the concomitant addition of a poly(A)
tail] at the 5' end of the gene. The transcriptional signals
of the MBG N P gene show a high degree of sequence
relatedness with those of the EBO NP gene; the signals
differ in three of the 14 bases in the transcriptional start
site and are shorter by one U at the end of the poly(A) site
(Sanchez et al., 1989). The start sites for the NP genes of
MBG and EBO can be related to those of paramyxoviruses (except RSV) in the first and third bases from the 3'
end (U and C, respectively) and the presence of the
sequence CUU or UUC. The poly(A) sites of the NP
genes of filoviruses are very similar to those of viruses in
the Paramyxovirus and Morbillivirus genera. The transcriptional signals of the NP genes of filoviruses are
distinct from those of these other viruses in that they
contain a common sequence, 3' U A A U U , that is
positioned at the 5' end of the start site and the 3' end of
the poly(A) site. This sequence is also seen in the other
genes of MBG and EBO (unpublished data), and its
significance in interactions with the viral polymerase is
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
354
A. Sanchez and others
(a)
(b)
1
2
3
4
5
Conserved region
Entire NP
RAB
VSV
SEN
HP3
MUM
NDV
RAB
~
VSV
SEN
HP3
-
-
MUM
-
-
NDV
MEA
MEA
MBG
MBG
EBO
RSV
I
NP..I~
EBO
RSV
Fig. 7. Dendrograms showing the relatedness of the NP proteins of
NNS viruses. Plots were generated using a multiple sequence
alignment program (PileUp) that employs a modification of the
method of Needleman & Wunsch (1970) to calculate pairwise
alignments of sequence clusters (Devereux et al., 1984). The entire
amino acid sequences are represented in (a) and the conserved regions
of the NP proteins shown in Fig. 6 (c), in (b). A gap weight of 3.0 and a
gap length weight of 0.5 were the settings used in the analysis. The
PileUpPep. Cmp comparison table file for peptides was used to assess
relatedness in pairwise alignments.
unknown. The MBG NP ORF is located 56 bases from
the 5' end of its mRNA, similar to those described for
other NNS RNA viruses, but this is far shorter than the
415 bases present in the EBO NP mRNA. Both the
MBG and EBO NP transcripts do, however, have long untranslated regions of 656 and 341 nucleotides [exclusive of
added poly(A) tail sequences] at their 3' ends, respectively.
The function of these long regions of untranslated
sequences is also unknown, but they may influence the
level of NP expression or some other viral process.
The 3~end 0fthe MBG genome (first 3000 nucleotides)
is AU-rich (59.7~), which has an effect on the codon
usage in the ORF (56-4~ AU content). An A or a U is
present in 43-8~ of the first base positions of codons,
compared to 63.9 ~ of the second and 62-8 ~o of the third.
This bias towards these bases results in an increase in
aspartic acid, glutamic acid, asparagine and glutamine,
which are abundant in the MBG NP as well as the EBO
NP. The ORF for the EBO NP, however, is not as AUrich and utilizes a somewhat more balanced codon usage.
The composition of the predicted amino acid sequence
of the MBG NP is very similar to that of the EBO NP.
They can be divided into an N-terminal half which is
• hydrophobic and a C-terminal half which is decidedly
hydrophilic (Fig. 4) and very acidic, a feature that is also
characteristic of the NPs of certain paramyxoviruses,
such as SEN and human parainfluenza virus type 3
(S~nchez et al., 1986). As seen with the EBO NP, the
MBG NP has three cysteine residues that are localized in
Fig. 8. Comparison of in vitro expressed and authentic MBG NP.
Fluorogram of [35S]methionine-labelled proteins subjected to SDSPAGE. Lanes 1, 3, 4 and 5 contain proteins immunoprecipitated from
in vitro translation reactions using a human anti-MBG serum. Synthesis
of translation products shown in lanes 1 and 3 was primed with a runoff transcript that contains the MBG NP coding region, in lane 4 with a
crude mRNA preparation of MBG-infected Vero E6 cell RNA
(positive control) and in lane 5 with a preparation of uninfected Veto
E6 cell RNA (negative control). Lane 2 is a marker lane with purified
MBG virion proteins and the position to which the virion NP migrates
is identified at the left edge of the figure.
the N-terminal third and concentrations of proline and
acidic residues in the C-terminal half of the molecule.
The MBG NP has a predicted M~ of 77.9K, much lower
than the value calculated from SDS-PAGE, 94 K (Kiley
et al., 1988). This difference is not unusual, since similar
observations has been reported for the NP of EBO and
other related viruses (Sanchez et al., 1989; Galinski et al.,
1986; S~inchez et al., 1986; Gallione et al., 1981). The
larger mass of filovirus NPs, compared to those of other
NNS RNA viruses, may be due to additional sequences
at the C terminus.
Computer matrix comparisons of the nucleic acid
sequences of the MBG and EBO NP genes (including the
3' leader sequence) indicate that only in the central part
of the coding regions is there any significant homology.
This finding is in contrast to the homology seen in the
predicted amino acid sequences, which show strong
identity or similarity in the 400 residues at the N
terminus, which is achieved through different codon
usage. The greatest degree of identity in these amino acid
sequences is located in the central part of the proteins
(MBG residues 297 to 330) and corresponds to the region
in which the nucleic acid sequences are most similar
(MBG bases 990 to 1090). The varying lengths of the
untranslated regions of the NP genes of MBG and
EBO, together with their different codon usages in
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
Marburg virus N P gene
generating very similar NP molecules, suggests that
these agents may have diverged at some point in the
distant past.
The lack of amino acid sequence homology in the Cterminal half of the NPs of MBG and EBO is analogous
to that seen with certain closely related paramyxoviruses,
in which the last 100 or more residues are very divergent
(Rozenblatt et al., 1985; Galinski et al., 1986; Jambou et
al., 1986; Matsuoka & Ray, 1991 ; Sakai et al., 1987; Lyn
et al., 1991). The reason for the divergence of the C
termini of these viruses has not been determined, but it
has been suggested that this region is exposed to the
environment and may interact with the viral matrix
protein in the assembly process (Heggeness et al., 1981;
Rima, 1989; Kondo et al., 1990; Barr et al., 1991). If the
size of filovirus NPs relative to the NPs of other NNS
R N A viruses is due to an increase in length at the C
terminus, then one might speculate that interactions of
this region with the matrix protein and/or other
structural proteins during the budding process could
result in the characteristic bacilliform shape of the
filovirus virion. Alternatively, this region might interact
with a putative second NP found in both EBO and MBG
(Kiley et al., 1988; Elliott et al., 1985).
The similarities in hydropathy and primary amino
acid sequence between the MBG and EBO NP proteins
are outwardly striking, but despite these likenesses the
serological evidence indicates that these agents are
antigenically unrelated. Localization of major antigenic
epitopes in the C-terminal half could explain this
phenomenon if cross-reactivity of antibodies to this
region is dependent on a high degree of conservation
(lacking between MBG and EBO). Immunodominance
of the C terminus in stimulating antibody production has
been described for the NPs of paramyxoviruses (Gill et
al., 1988; Tanabayashi et al., 1990) and a similar
condition could exist with filovirus NPs.
Results of matrix comparisons of the NP amino acid
sequences of filoviruses and other NNS R N A viruses
demonstrated that a small region in the central part of
these NPs showed some conservation between MBG and
SEN (Fig. 6a). Alignment of this N P region of MBG and
EBO to that of other NNS R N A viruses identified
certain sites that are conserved in these proteins,
particularly between filoviruses and paramyxoviruses.
Two areas within this region showing the greatest
concentration of similar sequences are underlined in the
consensus line of Fig. 6(c). This region has also been
shown by others to contain sequences that are highly
conserved within the Paramyxoviridae (Galinski et al.,
1986; S~nchez et al., 1986; Elango, 1989; Kondo et al.,
1990; Lyn et al., 1991 ; Morgan, 1991) and are contained
in the alignment in Fig. 6(c). Recently, Barr et al. (1991)
reported that pneumoviruses, paramyxoviruses, rhabdo-
355
viruses and EBO contain three separate conserved
regions, the largest of which is 50 bases and is contained
within the alignment shown in Fig. 6(c) (box C; MBG
sequence 255 to 304). The other two boxes are both 13
residues in length, one of which is contained in the Nterminal end of the alignment in Fig. 6(c) (box B; MBG
sequence 213 to 225) and the other is positioned in the Nterminal third of the NP (box A; MBG sequence 132 to
144). It should be noted that box A corresponds to the
beginning of the region of strong similarity seen between
MBG and EBO and that box C overlaps the longest run
of identity that ends this region. The authors stated that
this region appears to begin with sequences that contain
a high degree of a-helices and terminates with a fl-sheet
and reverse turn suggesting a similar conformation. They
hypothesized that the first 350 to 400 residues of the NP
proteins of these viruses may have retained a similar
structure with divergent sequences, and that the regions
of common structure (homology) may represent areas
involved in R N A binding and interactions with other
proteins in the nucleocapsid. Lyn et al. (1991) arrived at a
similar conclusion following cross-reaction studies using
anti-SEN NP monoclonal antibodies to detect epitopes
on the human parainfluenza type 1 virus NP. Our
alignment of the amino acid sequences of the MBG and
EBO NP proteins (Fig. 5) supports the observations of
maintained functional domains in the first 400 Nterminal residues. We have noted that these similarities
in amino acid composition occur within an extremely
hydrophobic domain, as seen in the hydropathy plots in
Fig. 4. The bar above the plots identifies the conserved
region aligned in Fig. 6 (¢) and the two arrows point to
two hydrophobic peaks that contain the two most related
sites. The peak nearest to the C terminus also corresponds to the sequence previously identified by Elango
(1989) as a region highly conserved in paramyxoviruses.
Galinski et al. (1986) also observed that conserved
regions, identified in alignments of paramyxovirus NP
sequences, resided in hydrophobic domains. N-terminal
to the conserved region in the MBG, EBO and SEN
hydropathy plots in Fig. 4 there is a prominent
hydrophobic peak, identified by an asterisk, positioned
approximately 80 amino acids from this region. Alignments of the sequences from these regions (MBG or EBO
with SEN) failed to show any significant relationship,
but they could represent a region whose structure/function is less sequence-dependent. The significance of
these hydrophobic sequences in the biology of the NP
proteins is unclear, but a role in either folding of the NP
and/or binding to the vRNA has been postulated
(Morgan, 1991). If these areas represent functional
domains, then their characterization would contribute
greatly to the basic understanding of the biology of the
ribonucleoprotein complex of NNS R N A viruses.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
356
A. Sanchez and others
Multiple amino acid sequence alignments of the NPs
of representative members of the three families of NNS
R N A viruses (seen in the form of dendrograms in Fig. 7)
revealed three things. First, the alignment of the NP
sequences, save that of RSV, segregated them into their
respective families. Second, filoviruses appear more
closely related to paramyxoviruses than to rhabdoviruses. And finally, the segregation of RSV with filoviruses
may indicate that a closer evolutionary relationship
exists between these agents. Using a different alignment
procedure, Pringle (1991) generated a similar dendrogram and noted the uniqueness of pneumoviruses. In a
paper that compared L amino acid sequences, Stec et al.
(1991) concluded that RSV represents a distinct lineage
from other paramyxoviruses and showed a dendrogram
that is similar to our results. Comparisons of other
filovirus genes and gene products should define further
the relationship of filoviruses to other NNS R N A
viruses, particularly in polymerase amino acid sequences
which have been found to be highly conserved (Blumberg et al., 1988; Galinski et al., 1988; Tordo et al., 1988,
Barik et al., 1990; Stec et al., 1991) and could provide a
better means of gauging genetic relatedness.
In conclusion, sequence analysis of the 3' end of the
MBG genome, including the entire NP gene, has shown
it to be organized and structured in a manner that is
consistent with those of EBO and other NNS RNA
viruses. The similarity of filovirus NP genes and gene
products to those of paramyxoviruses implies a closer
biological and phylogenetic relationship to these agents
than to rhabdoviruses.
DEVEREUX,J., HAEBERLI,P. & SMITHIES,O. (1984). A comprehensive
set of sequence analysis programs for the VAX. Nucleic Acids
Research 12, 387-395.
ELANOO, N. (1989). The mumps virus nucleocapsid mRNA sequence
and homology among the paramyxoviridae proteins. Virus Research
12, 77-86.
ELLIOT'r,L. H., KILEY, M. P. & McCORMICK,J. B. (1985). Descriptive
analysis of Ebola virus proteins. Virology 147, 169-176.
FELDMANN, H., WILL, C., SCHIKORE,M., SLENCZKA,W. & KLENK,
H.-D. (1991). Glycosylation and oligomerization of the spike protein
of Marburg virus. Virology 182, 353-356.
GALINSKI, M. S., MINK, T. A., LAMBERT,D. M., WECHSLER,S. L. &
PONS, M. W. (1986). Molecular cloning and sequence analysis of the
human parainfluenza 3 virus RNA encoding the nucleocapsid
protein. Virology 149, 139-151.
GALINSKI, M. S., MINK, i . A. & PONS, i . W. (1988). Molecular
cloning and sequence analysis of the human parainfluenza 3 virus
gene encoding the L protein. Virology 165, 499-510.
GALLIONE, C. J., GREEN, J. R., IVERSON,L. E. & ROSE, J. K. (1981).
Nucleotide sequences of the mRNA's encoding the vesicular
stomatitis virus N and NS proteins. Journal of Virology 39, 529-535.
GEAR, J. S. S., CASSEL,G. A., GEAR,A. J., TRAPPLER,B., CLAUSEN,L.,
MEYERS,A. M., KEW, M. C., BOTHWELL,T. H., SHER, R., MILLER,
G. B., SCHNEIDER, J., KOORNHOFF, H. J., GOMPERTS, E. D.,
ISSACSON,i . & GEAR, J. H. S. (1975). Outbreak of Marburg virus
disease in Johannesburg. British Medical Journal 4, 489-493.
GILL, D. S., TAKAI, S., PORTNER, A. & KINGSBURY, D. W. (I988).
Mapping of antigenic domains of Sendai virus nucleocapsid protein
expressed in Escherichia coli. Journal of Virology 62, 4805-4808.
GIORGI, C., BLUMBERG, B. & KOLAKOFSKY, D. (1983). Sequence
determination of the (+) leader RNA regions of the vesicular
stomatitis virus Chandipura, Cocal, and Piry serotype genomes.
Journal of Virology 46, 125-130.
GUBLER, O. & HOFFMAN,B. J. (1983). A simple and very efficient
method for generating cDNA libraries. Gene 25, 263-269.
GUPTA, K. C. & KINGSBURY,D. W. (1982). Conserved polyadenylation
signals in two negative-strand RNA virus families. Virology 120,
518-523.
HEGGENESS, M. H., SeHEID, A. & CHOPVIN, P. W. (1981). The
relationship of conformational changes in the Sendai virus nucleocapsid to proteolytic cleavage of the NP polypeptide. Virology 114,
555-562.
ISHIDA, N., TAIRA, H., OMATA, T., MIZUMOTO, K., HATTORI, S.,
IWASAKI,K. & KAWAKITA,M. (1986). Sequence of 2617 nucleotides
from the 3' end of Newcastle disease virus genome RNA and the
predicted amino acid sequence of viral NP protein. Nucleic Acids
Research 14, 6551-6564.
References
JAHRLING,R. B., GEISBERT,T. W., DALGARD,D. W., JOHNSON,E. D.,
KSIAZEK, T. G., HALL, W. C. & PETERS, C. J. (1990). Preliminary
BANERJEE, A. K., RHODES, D. P. & GILL, S. S. (1984). Complete
report: isolation of Ebola virus from monkeys imported to USA.
nucleotide sequence of the mRNA coding for the N protein of
Lancet (i) or (ii), 502-505.
vesicular stomatitis virus (New Jersey serotype). Virology 137, 432JAMBOU,R. C., ELANGO,N., VENKATESAN,S. & COLLINS,P. L. (1986).
438.
Complete sequence of the major nucleocapsid protein gene of human
BARIK, S., RUD, E. W., LUK, D., BANERJEE,A. K. & KANG, C. Y.
parainfluenza type 3 virus: comparison with other negative strand
(1990). Nucleotide sequence analysis of the L gene of vesicular
viruses. Journal of General Virology 67, 2543-2548.
stomatitis virus (New Jersey serotype): identification of conserved
KILEY, M. P., WILUSZ, J., McCORMICK,J. B. & KEENE, J. D. (1986).
domains in L proteins of nonsegmented negative-strand RNA
Conservation of the 3' terminal nucleotide sequences of Ebola and
viruses. Virology 175, 332-337.
Marburg virus. Virology 149, 251-254.
BARR, J., CHAMBERS, P., PRINGLE, C. R. & EASTON, A. J. (1991).
KILEY, M. P., Cox, N. J., ELLIOTT,L. H., SANCHEZ,A., DEFRIES, R.,
Sequence of the major nucleocapsid protein gene of pneumonia virus
BUCHMEIER, M. J., RICHMAN,D. D. & McCORMICK, J. B. (1988).
of mice: sequence comparisons suggest structural homology between
Physicochemical properties of Marburg virus: evidence for three
nucleocapsid proteins of pneumoviruses, paramyxoviruses, rhabdodistinct virus strains and their relationship to Ebola virus. Journal of
viruses and filoviruses. Journal of General Virology 72, 677-685.
General Virology 69, 1957-1967.
BLUMBERG, B. M., CROWLEY, J. C., SILVERMAN,J. I., MENONNA,J.,
KONDO, K., BANDO,H., KAWANO,M., TSURUDOME,M., KOMADA,H.,
COOK, S. D. & DOWLING, P. C. (1988). Measles virus L protein
N ISHIO,M. & ITO, Y. (1990). Sequencing analysis and comparison of
evidences elements of ancestral RNA polymerase. Virology164, 487parainfluenza virus type 4A and 4B NP protein genes. Virology 174,
497.
1-8.
CENTERS FOR DISEASE CONTROL (1989). Ebola virus infection in
KOZAK, M. (1986). Point mutations define a sequence flanking the
imported primates - Virginia 1989. Morbidity and Mortality Weekly
AUG initiator codon that modulates translation by eukaryotic
Report 38, 831-838.
ribosomes. Cell 44, 283-292.
COLLINS,P. L., ANDERSON,K., LANGER,S. J. & WERTZ, G. W. (1985).
KURILLA,i . G., STONE,H. O. & KEENE, J. D. (1985). RNA sequence
Correct sequence for the major nucleocapsid protein mRNA of
and transcriptional properties of the 3' end of the Newcastle disease
respiratory syncytial virus. Virology 146, 69-77.
virus genome. Virology 145, 203-212.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55
Marburg virus N P gene
KYTE, J. & DOOLITrLE, R. F. (1982). A simple method for displaying
the hydropathic character of a protein. Journal of Molecular Biology
157, 105-132.
LYN, D., GILL, D. S., SCROGGS, R. A. & PORTNER, A. (1991). The
nucleoproteins of human parainfluenza virus type 1 and Sendal virus
share amino acid sequences and antigenic and structural determinants. Journal of General Virology 72, 983-987.
MARTINI, G. & SIEGERT, R. (editors) (1971). Marburg Virus Disease.
Wien & New York: Springer-Verlag.
MATSUOKA,Y. & RAY, R. (1991). Sequence analysis and expression of
the human parainfluenza type 1 virus nucleoprotein gene. Virology
181, 403-407.
MAXAM,A. M. & GILBERT,W. (1980). Sequencing end-labeled DNA
with base-specific chemical cleavages. Methods in Enzymology 65,
499-560.
MORGAN, E. i . (1991). Evolutionary relationships of paramyxovirus
nucleocapsid-associated proteins. In The Paramyxoviruses, pp. 163179. Edited by D. W. Kingsbury. New York & London: Plenum
Press.
MORGAN, E. M., RE, G. G. & KINGSBURY,D. W. (1984). Complete
sequence of the Sendal virus NP gene from a cloned insert. Virology
135, 279-287.
NEEDLE~a~N, S. B. & WUNSCH, C. D. (1970). A general method
applicable to the search for similarities in the amino acid sequence of
two proteins. Journal of Molecular Biology 48, 443-453.
PRINGLE,C. R. ( 1991). The order Mononegavirales. Archives of Virology
117, 137-140.
RIC~OSON, J. H. & BARKLEY,W. E. (1988). Biosafety in Microbiological and Biomedical Laboratories. USPH, CDC. HHS Publication no.
88-8395.
RIMA, B. K. (1989). Comparison of amino acid sequences of the major
structural proteins of the paramyxo- and morbilliviruses. In Genetics
and Pathogenicity of Negative-strand Viruses, pp. 254-263. Edited by
B. W. J. Malay & D. Kolakofsky. Amsterdam: Elsevier.
ROSEN, J. M., Woo, S. L. C., HOLDER, J. W., MEANS, A. T. &
O'MALLEY, B. (1975). Preparation and preliminary characterization
of purified ovalbumin messenger RNA from the hen oviduct.
Biochemistry 14, 69-78.
ROZENBLATT,S., EISENBERG,O., BEN-LEVY,R., LAVIE,V. & BELLINI,
W. J. (1985). Sequence homology within the morbilliviruses. Journal
of Virology 53, 684-690.
SAKAI, Y., Suzu, S., SHIODA, T. & SHIBUTA, H. (1987). Nucleotide
sequence of the bovine parainfluenza 3 virus genome: its 3' end and
the genes of NP, P, C and M proteins. Nucleic Acids Research 15,
2927-2944.
357
SANCHEZ, A. & KILEY, M. P. (1987). Identification and analysis of
Ebola virus messenger RNA. Virology 157, 414-420.
SANCHEZ,A., BANERJEE,A. K., FURUICHI,Y. & RICHARDSON,M. A.
(1986). Conserved structures among the nucleocapsid proteins of the
Paramyxoviridae: complete nucleotide sequence of human parainfluenza virus type 3 NP mRNA. Virology 152, 171-180.
SANCHEZ, A., KILEY, i . P., HOLLOWAY,B. P., McCORMICK,J. B. &
AUPERIN, D. D. (1989). The nucleoprotein gene of Ebola virus:
cloning, sequencing, and in vitro expression. Virology 170, 81-91.
SANGER,R., NICKLEN,S. & COULSON,A. R. (1977). DNA sequencing
with chain-terminating inhibitors. Proceedings of the National
Academy of Sciences, U.S.A. 74, 5463-5467.
SMITH, D. H., JOHNSON, B. K., ISAAC'SON,M., SWANAPOEL, R.,
JOHNSON, K. M., KILEY, M., BAGSHAWE, A., SIONGOK, T. &
KERUGA, W. K. (1982). Marburg-virus disease in Kenya. Lancet i,
816-820.
STEC, D. S., HILL, i . G. & COLLINS,P. L. (1991). Sequence analysis of
the polymerase L gene of human respiratory syncytial virus and
predicted phylogeny of nonsegmented negative-strand viruses.
Virology 183, 273-287.
TANABAYASHI, K., TAKEUCHI, K., HISHIYAMA, i . , YAMADA, A.,
TSURUDOME,M., ITO, Y. & SUGIURA,A. (1990). Nucleotide sequence
of the leader and nucleocapsid protein gene of mumps virus and
epitope mapping with the in vitro expressed nucleocapsid protein.
Virology 177, 124-130.
TORDO, N., POCH, O., ERMINE, A. & KEITH, G. (1986). Primary
structure of leader RNA and nucleoprotein genes of the rabies
genome: segmented homology with VSV. Nucleic Acids Research 14,
2671-2683.
TORDO, N., POCH, O., ERMINE,A., KEITH, G. & ROUGEON,F. (1988).
Completion of the rabies virus genome sequence determination:
highly conserved domains among the L (polymerase) proteins of
unsegmented negative-strand RNA viruses. Virology 165, 565-576.
VIDAL, S. & KOLAKOFSKY,D. (1989). Modified model for the switch
from Sendal virus transcription to replication. Journalof Virology 63,
1951-1958.
ZIMMERN,n. & KAESBERG,P. (1978). T-Terminal nucleotide sequence
of encephalomyocarditis virus RNA determined by reverse transcriptase and chain-terminating inhibitors. Proceedings of the
National Academy of Sciences, U.S.A. 75, 4275-4261.
(Received 10 July 1991; Accepted 2 October 1991)
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Sun, 30 Apr 2017 04:58:55