* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Sequence analysis of the Marburg virus nucleoprotein gene
Community fingerprinting wikipedia , lookup
RNA interference wikipedia , lookup
RNA silencing wikipedia , lookup
Messenger RNA wikipedia , lookup
Polyadenylation wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Proteolysis wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Two-hybrid screening wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Epitranscriptome wikipedia , lookup
Homology modeling wikipedia , lookup
Protein structure prediction wikipedia , lookup
Gene expression wikipedia , lookup
Biochemistry wikipedia , lookup
Plant virus wikipedia , lookup
Point mutation wikipedia , lookup
Biosynthesis wikipedia , lookup
Silencer (genetics) wikipedia , lookup
347 Journal of General Virology (1992), 73, 347-357. Printed in Great Britain Sequence analysis of the Marburg virus nucleoprotein gene: comparison to Ebola virus and other non-segmented negative-strand RNA viruses Anthony Sanchez,l*~f Michael P. Kiley, 2 Hans-Dieter Kienk 3 and Heinz Feldmann 3 1Department of Biology and Laboratory for Microbial and Biochemical Sciences, Georgia State University, Atlanta, Georgia 30302-4010, 2Research and Development Program, The Salk Institute, Government Services Division, Swiftwater, Pennsylvania 18370, U.S.A. and 3Institut j~r Virologie, Philipps-Universiti~t Marburg, 3550 Marburg, Germany The first 3000 nucleotides from the 3' end of the Marburg virus (MBG) genome were determined from cDNA clones produced from genomic RNA and mRNA. Identified in the sequence was a short putative leader sequence at the extreme 3' end, followed by the complete nucleoprotein (NP) gene. The 5' end of the NP mRNA was determined as was the polyadenylation site for the NP gene. The transcriptional start (3' U U C U U C U U A U A A U U . . ) and termination (3' ..UAAUUCUUUUU) signals of the MBG NP gene are very similar to those seen with Ebola virus (EBO). In comparison to other non-segmented negative-strand RNA viruses, filovirus transcriptional signals are most similar to members of the Paramyxovirus and Morbillivirus genera. In vitro translation of a run-off transcript containing the entire MBG NP coding region produced an authentic NP. Sequence comparisons of the 3' end of the MBG and EBO genomes revealed weak nucleotide sequence similarity, but the predicted sequence of the first 400 amino acids of these viruses showed a high degree. This homology is encoded in divergent nucleotide sequences through different codon usages and substitutions of similar amino acids. A small region in the middle of the MBG and EBO NP sequences was found to contain a significant amino acid homology with NPs of paramyxoviruses and to a lesser extent with rhabdoviruses. Specific sites of conserved sequence are contained in hydrophobic domains and may have a common function. Alignments of the entire NP amino acid sequences of these viruses also suggest that filoviruses are more closely related to paramyxoviruses than to rhabdoviruses. Introduction MBG, Ebola virus (EBO) and Reston virus (RES) (Ebola-like monkey filovirus; Centers for Disease Control, 1989; Jahrling et al., 1990) are non-segmented negative-strand (NNS) RNA viruses and are members of the family Filoviridae. Together with the Paramyxoviridae and Rhabdoviridae, these families make up the order Mononegavirales, a status accorded to them in 1990 by the International Committee on Taxonomy of Viruses (Pringle, 1991). The filovirus virion is bacilliform in morphology and is composed of a helical nucleocapsid surrounded by a lipid envelope. Virions contain at least seven structural proteins and for MBG these proteins are an RNA-dependent RNA polymerase (L protein; Mr 267K; unpublished data), a single surface glycoprotein (GP; Mr 170K; Feldmann et al., 1991), a nucleoprotein (NP; Mr 94K) and four proteins ranging in Mr from 24K to 38K (Kiley et al., 1988). The genetic features of filoviruses are similar to other NNS viruses in that (i) transcription of the infecting ribonucleoprotein complex, which contains a single negative-sense genomic RNA template, yields monocistronic polyadenylated mRNA species and (ii) for EBO Marburg virus (MBG) is a 'Biosafety Level 4' agent (Richardson & Barkley, 1988) that was first identified in 1967 following human outbreaks of acute haemorrhagic fever in the cities of Marburg and Frankfurt, Germany, and Belgrade, Yugoslavia (Martini & Siegert, 1971). Initial infections occurred in persons working with blood, organs or cell culturing of tissues from infected African green monkeys (Cercopithecus aethiops) imported from Uganda. This pathogen received its name from the city of Marburg, where most of the cases occurred and where much of the initial work on the virus was performed. Three subsequent human outbreaks have been attributed to MBG (Gear et al., 1975; Smith et al., 1982; Kiley et al., 1988). t Present address: Special Pathogens Branch, Division of Viral and Rickettsial Diseases, National Center for Infectious Diseases, Centers for Disease Control, 1600 Clifton Road, Mail Stop GI4, Atlanta, Georgia 30333, U.S.A. The nucleotide sequence data reported in this paper have been assigned GenBank accession number M72714. 0001-0487 © 1992 SGM Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 348 A. S a n c h e z and others (Zaire subtype), the g e n o m e is organized such that the N P gene is e n c o d e d at the extreme 3' e n d of the g e n o m e a n d (iii) the E B O N P gene c o n t a i n s similar transcriptional signals t h a t delineate the genes (Kiley et al., 1986, 1988; Sanchez & Kiley, 1987; Sanchez et al., 1989). T o define the genetic r e l a t i o n s h i p of M B G to EBO, R E S a n d other N N S R N A viruses more fully, we have u n d e r t a k e n a project of c l o n i n g a n d s e q u e n c i n g the entire g e n o m e of a 1980 isolate of M B G ( M u s o k e strain). I n this report we p r e s e n t sequence data for the first 3000 nucleotides from the 3' e n d of the g e n o m i c R N A a n d c o m p a r i s o n s of the nucleic acid a n d predicted a m i n o acid sequences of the M B G N P gene to EBO, p a r a m y x o viruses a n d rhabdoviruses. Methods Cells and viruses. Vero E6 cells were used to culture viruses as previously described (Sanchez & Kiley, 1987). The Musoke strain of MBG was used throughout this study and was derived from the serum of a fatal human infection in Nairobi, Kenya in 1980 (Smith et al., 1982). The virus was isolated and plaque-purified three times on Veto E6 cells, and then large seed stocks were prepared in the same cell line. For comparisons, a Zaire subtype of EBO (Mayinga strain) was used; its passage history is described elsewhere (Sanchez et al., 1989). Preparation of viral RNAs, molecular cloning and sequencing. Preparation of MBG genomic RNA (vRNA) and mRNA, synthesis of cDNA, molecular cloning in pUC18, identification of virus-specific clones, and chemical and dideoxynucleotide sequencing were performed as previously described (Gubler & Hoffman, 1983; Sanchez et al., 1989; Maxam & Gilbert, 1980; Sanger et al., 1977; Zimmern & Kaesberg, 1978). A synthetic oligodeoxynucleotideprimer complementary to the first 19 bases from the 3' end of the genome (5' AGACACACAAAAACAAGAGATG) was used to generate first-strand cDNA (vRNA template) in the production of one cDNA library, and also to probe cDNA libraries generated from vRNA and poly(A)-tailed vRNA. The 5" end of the MBG NP mRNA was sequenced by primer extension, using the primer 5" CCAACAAACTGTGTAAATCCAT (vRNA sense; bases 125 to 104), which was radiolabelled at the 5' end with [~,-32p]ATP and used in a first-strand reaction. The extension product was isolated from a sequencing gel and chemically sequenced as previously described (Sanchez et al., 1989). Agarose gel electrophoresis and Northern blot hybridization. Acidurea-agarose (1.5% w/v) gel electrophoresis, blotting and hybridizations were performed as described elsewhere (Rosen et al., 1975; Sanchez et al., 1989), except that GeneScreen Plus (New England Nuclear) was used as the hybridization transfer membrane and blotted RNA was not baked onto the membrane. Computer-aided sequence analysis. The Sequence Analysis Software Package developed by the Genetics Computer Group (University of Wisconsin BiotechnologyCenter; Version 7.0) was used in analysing sequence compositions, sequence comparisons, manipulations, and graphic output (Devereux et at., 1984). Construction and in vitro expression of the MBG NP gene coding region. The entire MBG NP gene open reading frame (ORF) was synthesized by the polymerase chain reaction (PCR) technique using a commercial kit (Perkin-Elmer Cetus). Primer pairs used in amplifying the MBG NP coding region from vRNA template are as follows: 5' CA- Leader ~1 Nucleoprotein gene I I 3,11 I I Start site ~5' Poly(A) site MV-881 ] MV-17 MV-39 / MV-34[~ I ii i i i i i 0 i i i 1000 i i i i iq i ; I ; 2000 ~ : : : : :: I i I I I I 3000 Fig. 1. The 3' end of the MBG genome and cDNA clones in sequence analysis. A schematic representation of the 3' end of the MBG genome is shown at the top of this figure. From left to right are the putative leader sequence, the transcriptional start site, the non-coding 3' end of the NP gene, the NP ORF, the 5' non-coding region and the poly(A) site. Below this drawing are shown the principal cDNA clones (generated from vRNA) used in sequencing studies and their positions as they align on the MBG genome. The nucleotide sequences for the cloned inserts are as follows: MV-88, 1 to 1744; MV-17, 954 to 3593; MV-39, 1 to 941; MV-34, 1 to 257. At the bottom is a scale as a reference for nucleotide sequence lengths. GGGTACCGTGTATCATATAAATAAAGAAGAATATTAAC (mRNA sense; bases 31 to 61) and 5' CAGGGTACCGCTGCATGTATGATGAGTCCCACATTGTGA(vRNA sense; bases 2969 to 2940). These primers each contain a KpnI restriction endonuclease site at the 5' end to facilitate cloning. First-strand DNA synthesis was performed using vRNA and 0.3 ktg RNA sense primer (Sanchez et al., 1989), then the products were incorporated into a PCR assay using 0.3 ~tg of each primer in a 100 ktl reaction. The amplified DNA was digested with KpnI and ligated into the KpnI site of the pGEM3Zf(+) transcription vector (Promega). A plasmid was isolated that contained the initiating AUG codon positioned downstream of the T7 RNA polymerase promoter. The DNA from this construct was purified by banding in CsC1 gradients, then uncapped run-off RNA transcripts were produced from this DNA after it had been linearized with XbaI. Transcription was performed using T7 RNA polymerase in a largescale transcription reaction (Promega protocol). The resulting transcript was translated in vitro and labelled with pS]methionine (New England Nuclear) in a rabbit reticulocyte lysate system (Promega). Translation products were immunoprecipitated with a human antiMBG serum, subjected to SDS-PAGE and processed for fluorography as previously described (Sanchez & Kiley, 1987). Results Cloning and sequence analysis o f the 3" end o f the M B G genome A schematic r e p r e s e n t a t i o n of the p r i m a r y clones used i n g e n e r a t i n g the Y e n d sequence d a t a for the M B G g e n o m e is s h o w n i n Fig. 1. T h e sequences o b t a i n e d were d o u b l e - s t r a n d d a t a o b t a i n e d by either c h e m i c a l or d i d e o x y n u c l e o t i d e s e q u e n c i n g of cloned inserts. T h e clones MV-88 a n d MV-39 c o n t a i n the exact 3' e n d of the g e n o m e a n d were generated by p r i m i n g first-strand Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 Marburg virus NP gene 349 (c) T G + + CCAG (a) 1 N N C T I C T T A T 28S,~1~ . A A T T 18S,~ G T A A Fig. 2. Specificity of cDNA clones, MBG mRNAs and sequencing of the 5' end of the MBG NP mRNA. (a) Northern blot hybridization of 32P-labelled probes (nick translation) generated from eDNA clones MV-88 and MV-17 (see Fig. 1) to lanes of RNA resolved by electrophoresisin an acid-urea-agarose (1-5~) gel. Lanes include preparations of purified vRNA (lanes 1, 4 and 7), a crude preparation of MBG mRNA (lanes 2, 5 and 8) and uninfected Vero E6 total cell RNA (lanes 3, 6 and 9). Lanes 1 to 3 show a set of lanes that were stained with ethidium bromide prior to blotting. Hybridization was performed under stringent conditions (50% formamide; 42 °C) overnight. The locations of the 28S and 18S ribosomal RNA bands are identified at the left edge of the figure. (b) Fluorographyof an acid-urea-agarose gel containing lanes of [3H]uridine-labelled RNA from Vero E6 cells infected with either MBG (lane 1) or EBO (lane 2) (treated with actinomycin D prior to labelling). Lanes were aligned with Northern blots and an asterisk identifiesthe position of the MBG NP mRNA. (c) Autoradiograph of a 6% sequencing gel that shows the sequence for the 5' end of the MBG NP mRNA. An mRNA-complementaryprimer, labelled at the 5' end with [y-32p]ATP,was annealed close to the 5' end of the genome, extended with reverse transcriptase, and the extension products were chemically sequenced (Maxam & Gilbert, 1980). c D N A synthesis with a 3' complementary synthetic oligonucleotide plus random priming, using v R N A as template. Clone MV-34 contains the 3' end plus a poly(A) tail added to v R N A with poly(A) polymerase (prior to first-strand c D N A synthesis) and primed with oligo(dT). In addition to sequencing of cloned inserts, the sequence from nucleotides 1650 to 3000 was verified through direct dideoxynucleotide sequencing of purified vRNA. Hybridization of clones MV-88 and MV-17 to Northern blots of M B G v R N A and m R N A transcripts demonstrated their specificity for M B G sequences and identified the transcripts recognized by these clones (Fig. 2a). Clone MV-88 hybridizes to a single transcript, which corresponds to a large M B G m R N A species seen in Fig. 2(b), indicated by an asterisk, and is comparable in size to the EBO N P m R N A . As shown in Fig. 1, the MV-17 clone contains sequences that overlap the first and second genes, and hybridization analysis showed that this clone anneals to the N P transcript and to a second m R N A species that is transcribed from the next (adjacent) gene. The agarose electrophoresis pattern of oligo(dT)-selected M B G m R N A transcripts is very similar to that seen for EBO (Fig. 2b) and also shows them to be polyadenylated and monocistronic. The 5' end of the N P m R N A was sequenced by primer extension and chemically sequencing the extension product, the results of which are shown in Fig. 2 (c). The last band of this sequencing ladder is weaker than the next smaller band and m a y be due to (i) variability in the exact site in which transcription of the N P m R N A occurs, (ii) copying of a 5' cap structure by reverse transcriptase, (iii) premature termination of the extension product caused by a cap structure or (iv) an artefact of the sequencing chemistry. In any event, from the results obtained from 5' end sequence analysis, the transcriptional start site shown in Fig. 3 represents the longest possible transcript. Fig. 3 shows the viral complementary sequence ( m R N A sense) of the M B G genome, including the entire N P gene. The transcriptional start site and stop or poly(A) site are identified on the sequence and delineate the N P gene. The start site was determined by sequencing the 5' end of the N P m R N A , and the poly(A) Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 A. Sanchez and others 350 Leader ...... > 5' A G A C A C A C A A A A A C A A G A G A U G A U G A U U U U G U G U A U C A U A U A A A U A Start S i t e - - - > AAGAAGAAUAUUAACAUUGACAUUGAGACUUGUCAGUCGUGUAAUAUUCUUGAAGAU 103 CCC A C U GCC C C U C A U G U C C G U A A U A A G A A A G U G A U A U U A U U U G A C A C A A A U C A U Pro T h r A l a Pro His Val A r g A s n Lys Lys Val Ile L e u Phe A s p T h r A s h His 193 CAG GUU AGU AUC UGU AAU CAG AUA AUA GAU GCA AUA AAC UCA GGG AUU GAU CUU GGA GAU CUC CUA GAA GGG GGU UUG CUC ACG UUG UGU 31 G l n V a l Ser !!e C y s A S h G i n lle lle A s p A l a Ile A s n Ser G l y Ile A s p Leu Gly A s p L e U L e u G l u G l y G l y L e u L e u T h r L e u Cys 283 G U U G A G C A U U A C U A U A A U U C U G A U A A G G A U A A A U U C A A C A C A A G U C C U G U C G C G A A G U A C U U A C G U G A U G C G G G C U A U G A A U U U G A U GUC 61 V a l G l u His T y r T y r A s n S e r A s p L y s A s p L y s P h e A s n T h r S e r Pro Val A l a Lys T y r L e u A r g A s p A l a G l y T y r G l u Phe A s p V a l 373 A U C A A G A A U G C A G A U G C A A C C C G C U U U C U G G A U GUG A G U C C U A A U 91 Ile Lys A s n A l a A s p A l a T h r A r g P h e Leu A s p Val Ser Pro A s n 463 AUG GAU UUA CAC AGU UUG UUG GAG UUG GGU ACA AAA 1 Met A s p L e u His S e r L e u L e u G l u Leu G l y T h r Lys G A A C C U C A U UAC A G C C C U U U A A U U C U A G C C C U U A A G A C A U U G G A A Glu Pro His T y r S e r Pro L e u Ile L e u A l a L e u L y s T h r L e u G l u GGA GAC CGA GCU AGU G l y A s p A r g A l a Ser 553 CUU ACC ACA GGC CAC AUG AAA L e u T h E T h r G l y His M e t Lys 643 GUA AUU UUC GGG AUU UUG AGG UCC AGC UUC AUU UUA AAG UUU GUG UUG AUU CAU CAA GGA GUA AAU 181 Val Ile Phe G l y Ile Leu A r g Ser Ser Phe Ile Leu Lys Phe Val L e u Ile His Gln G l y V a l A S h UUG GUG ACA GGU CAU GAU GCC UAU L e u V a l T h r G l y His A s p A l a T y r 733 GAC AGU AUC AUU AGU AAU 211 A s p S e t Ile Ile S e r A s n UCA GUA GGU CAA ACU AGA UUC UCA GGA CUU CUU AUC GUG AAA ACA GUU S e r V a l G l y G l n T h r A r g P h e Ser G l y L e u Leu Ile V a l Lys T h r Val CUC GAG UUC AUC UUG CAA AAA ACU L e u G l u Phe Ile L e u G l n L y s T h r 823 GAU UCA GGG GUG ACA CUA 241 A s p S e r G l y V a l T h r Leu CAU CCU UUG GUG CGG ACC UCC AAA GUA AAA AAU His P r o L e u V a l A r g T h r Ser Lys Val Lys A s n 121 AGU ACU GAA UCU CAG AGG Ser T h r G l u Ser G l n A r g 151 A U C G A A A A G G C U U U A A G A C A A G U A A C A G U G CAU Ile G l u Lys A l a L e u A r g G l n Val T h r V a l His GGG AGA AUU GGG CUC UUU UUA UCA UUU UGC AGU CUU UUC CUC CCAAAA G l y A r g Ile G l y Leu Phe L e u Ser Phe Cya Ser Leu Phe Leu Pro Lys CAA GAA CAG GGG AUC GUC ACA UAC CCU AAU G l n G l u G l n G l y Ile Val T h r T y r Pro A s h CGA CAU GGG GAA UAC GCA CCA UUU GCA CGG GUU CUG AAU 271 A r g His G l y G l u T y r A l a P r o Phe A l a A r g Val L e u A s n 301 CUU GUC GUC L e u Val V a l CAU UGG His T r p G A A G U U GCU A G U U U C A A G Glu Val A l a Set Phe Lys UUA UCA GGG AUU AAC AAC L e u Set G l y Ile A s h A s n CUC GAA CAU L e u G l u His CAG GCG UUG AGC AAC Gln Ala Leu Ser Asn G G A CUC U A U C C U C A G Gly Leu T y r P r o G l n C A A A G G C G A C A U G A A C A U C A G G A A A U U C A A GCU A U U GCC G A G G A U GAC G l n A r g A r g H i s G l u His G l n G l u Ile G i n Ala Ile A l a G l u A s p A s p C U U U C A G C A 1003 Leu Ser Ala CUA CGA GAG GCG GCA CAU L e u A r g G l u A l a A l a His 1093 GAG GAA AGG AAG AUA UUA GAA CAA Glu G l u A r g Lys Ile L e u G I u G l n 1183 A U U G C G C U G G G U G U G G C A A C A G C A C A C GGC A G U A C A U U G G C U G G U GUC A A U G U U GGC G A A C A A U A U C A A C A A Ile A l a L e u G l y V a l A l a T h r A l a His G l y Ser T h r Leu A l a G l y Val A s n Val G l y G l u G l n T y r G l n G l n GAU GCG GAA GUA AAA CUA 331 A s p A l a G l u Val Lys Leu 913 CUA GCC Leu Ala U U C C A C C U U C A G A A A A C U G A A A U C A C A C A C A G U C A G A C A C U A G C C G U C C U C A G C C A G A A A C G A G A A A A A U U A G C U C G U C U C G C U G C A G A A 1273 361 P h e His L e u G l n L y s T h r G l u lle T h r H i s Set G l n T h r L e u A l a V a l L e u Set Gln Lys A r g G l u Lys Leu Ala A r g Leu A l a A l a Glu 391 AUU G A A A A C A A U Ile G l u A s n A s n AUU GUG lle V a l G A A G A U C A G G G A U U U A A G C A A U C A C A G A A U C G G GUG U C A C A G U C G U U U G l u A s p G l n G l y Phe Lys G l n Ser G l n A s h A r g Val Ser G l n S e t Phe U U G A A U GAC Leu A s n A s p C C U A C A C C U G U G G A A 2363 Pro T h r Pro V a l Glu CAU GAA UCU ACU GAA GAU AGC His G l u S e r T h r G l u A s p Ser 1453 U C U U C U U C A A G U A G C U U U G U U G A C U U G A A U G A U C C A U U U G C A C U G C U G A A U GAG GAC G A G G A U A C U C U U G A U G A C A G U G U C A U G A U C C C G 451 S e r S e r S e r Ser S e r Phe Val A s p L e u A s n A s p P r o Phe A l a L e u Leu A s n G l u A s p G l u A s p T h r L e u A s p A s p S e r V a l M e t Ile Pro 1543 GGC A C A A C A U C G A G A G A A U U U C A A G G G A U U C C U G A A C C G C C A A G A C A A U C C C A A GAC 481 G l y T h r T h r S e r A r g G l u Phe G l n G l y Ile Pro G l u Pro Pro A r g G l n Ser G l n A s p 1633 GUA ACG GUU 421 Val T h r V a l 511 CAA GCC AGG Gln Ala Arg U C C A C A A A U C G G A U U A A G A A A C A G U U U C U G A G A U A U C A A G A A U U G CCU Set T h r A s n A r g Ile Lye Lys G l n P h e Leu A r g T y r G l n G l u L e u Pro CAA GAA AGC AUC GAC CAA 541 G l n G l u S e t Ile A s p G l n CCA AUA CAG CAC 571 P r o Ile G l n His 601 C C C A U G A A U C G A C C A A C U G C U C U G C C U C C C C C A G U U GAC GAC A A G A U U GAG Pro M e t A s n A r g Pro T h r A l a Leu Pro Pro Pro Val A s p A s p Lys Ile G l u CCA GGA UCC GAC AAU Pro Gly S e r A s p A s n CCA GCA GCA AAC Pro A l a A l a A s n CUC A A U A A C A G C L e u A s h A s h Ser CCU G U U C A A G A G GAU GAU Pro Val Gln G l u A s p A s p GAA UCG Glu Ser CAG GAA GAU GAA Gln Glu Asp Glu G A A U A C A C A A C U G A C U C U 1723 G l u T y r T h r T h r A s p Set G A A C A A G G A G U U G A U C U U C C A C C U C C U C C G U U G U A C G C U C A G G A A A A A A G A C A G GAC Glu Gln G l y Val A s p Leu Pro Pro Pro Pro L e u T y r A l a G l n G l u Lys A r g G l n A s p 1813 C C U CAG G A U CCC U U C G G C A G U A U U G G U G A U G U A A A U G G U G A U A U C U U A G A A C C U A U A A G A U C A C C U P r o G l n A s p Pro P h e G l y Ser Ile Gly A s p Val A s n G l y A s p Ile L e u G l u P r o Ile A r g Ser Pro 1903 UCU UCA CCA UCU GCU CCU CAG GAA GAC ACA AGG AUG AGG Ser S e r P r o Ser A l a Pro G l n G l u A s p T h r A r g Met A r g G A A GCC U A U G A A U U G U C G C C U G A U U U C A C A A A U G A U G A G G A U A A U G l u A l a T y r G l u L e u Ser Pro A s p Phe T h r A s n A s p G l u A s p A s n A A U U G G C C A C A A A G A G U G G U G A C A A A G A A G G G U A G A A C U U U C C U U U A U C C U A A U GAU C U U 631 A s n T r p Pro G l n A r g V a l Val T h r Lys L y s G l y A r g T h r Phe Leu T y r Pro A s n A s p Leu A C A GCC 661 T h r A l a CAA GGA AAG G l n G l y Lys CUC GUU GAG GAA UAC CAA AAU Leu Val Glu GIu Tyr Gln Asn CCU GUC UCA GCU AAG P r o Val Ser A l a Lys CUG CAA ACA AAU Leu Gln Thr Asn GAG CUU C A A G C A G A U U G G C C C G A C A U G G l u Leu G l n A l a A s p T r p Pro A s p M e t C A G CAG G l n Gln C C U C C A G A G U C A C U U A U A 2083 P r o P r o G l u Set L e u Ile U C A U U U GAU Ser Phe A s p GAA AGG AGA CAU G l u A r g A r g His GUU GCG AUG AAC UUG UAG UCCAGAUAACACAGCACGGUUACUCACUUAUCUAUUUUGAUA•GACUCAUCCUCAGAUCACAGCAAUCAAAUUUAUU•GAAUAUUUGAACCACCU 691 Val A l a M e t A s n L e u T E R UUUAG UAUC CUAUUACUUG U UAC UAUU GUGUGAGACAACAUAAGC UUUUUUCAAUUGCUAUAAUUAUACAAC 2173 2286 C A U C A A A U A A C A A U C A CG G G C A A G G A C U G G G C A U A C U A U G G U GG UC U U A G A G C A U U G U C C A G U G C U A C A A A U U C 2405 U A C A A A C C UC C A U A C A U U U G C C G C A A C A C U G U A A U C A A C A CU GC UG U A U C U C U U C U U C A A G C C A U C U G A U U U A A C U U A A U A A A C A U G A C U U G 2524 AUU CAAAGAAUAUAC U GACAAUGUUACUGUUUGAAUUUC GUAUUCUUUAUAAAUCACU 1993 U C A A G U G G U G C A C U A U CC UAC U GU UUUG CU C A G C U U A G U A U A U U G U A A U A U G U A A G U G G A C U CU CC CC U U CU CC U C U C GO 2643 UACUUGAUAGAAUGUC GAGUCUACUGGUUUGGAGUUUCCU UACUCUAAUGGAUGUAAUAAUUAACUGUUGGCCUAGAUGAUAACAGAUAU GAGGUUAUAU 2762 P o l y (A) S i t e AAUUA••CAUAG•GUAAAGUAUAAUUCUUACCUCUG•UUCUUCUGUUUUCCCUUUCUUUUAUAAUAUGCCAAUUAAGAAAAA•UAAAAAUCGAAGAAUAUUAAAGAu•U••U•UAAUA• 2881 U CAGAAAAGGCUUUUUAUU 3000 CUAUUCUUUCUUUU UACAAACGUAUUGAAAUAGUAAUU CU C A C A A U GU GG G A C U C A U C A U A C A U G C A G C A A G U C A G C G A A G G G U U G A U G A C U G G A A A A G U Fig. 3. Viral complementary sequence (mRNA sense) of the 3' end of the MBG genome. Identified in the figure are the putative leader sequence, the transcriptional start site, the NP coding region and the poly(A) site. The exact start of the gene (5' end of mRNA) has not been determined, but primer extension using MBG mRNA as a template and chemical sequencing of the extension product have identified it to within one base. Thus the maximum length of the putative leader sequence and the beginning of the NP gene in this figure are preliminary and have not been confirmed. As shown, the gene is 2798 nucleotides in length and encodes a protein that contains 695 amino acids. site was identified by the isolation and sequencing o f three m R N A clones containing the poly(A) tail region o f the N P transcript (data not shown). The M B G N P O R F encodes a protein that is 695 amino acids in length and has a predicted Mr o f 77"86K. The O R F is initiated by the first A U G from the 5' end o f the m R N A and is flanked by sequences that place it in a favourable context (Kozak, 1986). The N P contains a large amount o f negatively charged amino acids, resulting in a highly acidic protein (net charge - 28). A s shown with the EBO N P , the majority o f the acidic amino acids and proline residues are found in the C- Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 Marburg virus NP gene 3 z ° 0I .r1 •r a.r• Ll -10 . . an . . . . . . .~ I I . 100 200 300 100 I ' 200 300 '-I-I-I I l t . s I/ ~ 400 500 400 ' .~ 500 600 700 o ~z ~v 3200(~)1 100 200 300 400 Amino acid sequence number Fig. 4. Hydropathy plots of the predicted amino acid sequencesof NP proteins. Hydropathicprofilesof the NP proteinsof MBG (a), EBO (b), SEN (c) and VSV (d) weregeneratedby the method of Kyte & Doolittle (1982) using a window size of seven residues. Solid bars above the hydropathy plots identify correspondingregions of which the amino acid sequences are shown aligned in Fig. 6. Asterisks identify a prominent hydrophobic peak that is seen in identical positions with respect to the sequencesmarked by the bars for MBG, EBO and SEN. terminal half of the MBG N P (68~o and 7 5 ~ , respectively, between residues 348 and 695), and all three cysteine residues are found in the N-terminal third of the NP. The M B G N P is also similar to the EBO N P in its hydropathy plot, seen in Fig. 4, which shows that the N P can be divided into a hydrophobic N-terminal half and a hydrophilic C-terminal half. For comparison, the hydropathy plots of the N P proteins of Sendai virus (SEN) and vesicular stomatitis (VSV) are also shown in Fig. 4, and it appears that the SEN N P profile is closer to those of the M B G and EBO NPs than that of the VSV NP. This similarity in hydropathic profiles is particularly prominent in the hydrophobic sequences from residue 130 to approximately 320 (160 to 350 for SEN). Sequence comparisons of the MBG and EBO NP genes Computer-aided matrix (dot plot) comparisons of the nucleotide and amino acid sequence homologies between the MBG and EBO N P genes are shown in Fig. 5(a). Some scattered similarity is seen in the nucleotide sequences, corresponding to bases 450 to 1100 for M B G and 950 to 1600 for EBO. In contrast, matrix comparison of amino acid sequences reveals a close resemblance between these two proteins in the N-terminal 400 351 (approximate) residues, with only a small break around residues 120 to 140. An alignment of the N P amino acid sequences of MBG and EBO is shown in Fig. 5 (c). The alignment shows that the region from positions 130 to 392 of the MBG sequence has very strong similarity, and is highlighted by a run of 34 identical amino acids (MBG sequence 296 to 329). The strongest identity seen in the nucleic acid matrix comparison corresponds to the strongest regions of identity seen in the amino acid alignment. It should also be noted that two of three cysteines in this alignment are conserved and are nearest to the N terminus. Comparisons of the NP amino acid sequences of filoviruses and other NNS RNA viruses Comparisons of the predicted N P amino acid sequences of filoviruses and other N N S R N A viruses were performed to determine whether any conserved regions were present. These analyses revealed that in both the MBG and EBO NPs there is a short region which has a significant degree of identity with paramyxovirus NPs, and to a lesser extent with those of rhabdoviruses. This sequence in the M B G and EBO N P corresponds to the highly conserved region in the central part of the protein described above. These findings are illustrated in Fig. 6 (a) and (b), which show matrix comparisons of the N P amino acid sequences of MBG and SEN, and M B G and VSV, respectively. The MBG amino acid sequence around the short region of similarity seen in the centre of the M B G / S E N matrix plot (see arrow, Fig. 6a) was aligned to the N P amino acid sequences of six paramyxoviruses and two rhabdoviruses (Fig. 6c). This alignment was initially computer-generated, and then a manual alignment was made to minimize the introduction of gaps. A consensus sequence was derived and asterisks beneath the consensus line mark the locations where both MBG and EBO sequences are either identical or have amino acids similar to those of the consensus sequence. At these locations M B G and EBO sequences occured in 79 ~ of the consensus positions and showed a significantly greater likeness to paramyxoviruses than to VSV or rabies virus. This region of similarity contains the sequences previously identified by Elango (1989) as highly conserved within the family Paramyxoviridae (underlined positions in line with asterisks in Fig. 6c). Fig. 7 shows a set of computer-generated dendrograms which schematically show the relatedness of the amino acid sequences of the entire and conserved region of the NPs of the viruses seen in Fig. 6(c). The homology of these viral N P sequences can be seen in the clustering of sequences and can be measured by the lengths of the branches in the horizontal axis, which is proportional to Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 352 A. Sanchez and others (a) Nucleotides o I000 ' 2000 . . . . ' . . . . -3000 - 2000 Z (c) 1 .................. MDLHSLLELGTKFTA~VRNKKV~LFDTN~QVS~CNQIZDAINSGIDLGDLLEGGLLTLCVEHYYNSDKDKFNTSPVAKYLR82 ii.i.:i- / i .... ii-: : :::-i: .i[, iJ:i::.l:1:-: :: ir iI:.i i.:i . i .i...ili 1 MD•RPQ•IWMAP•LTESDMDYHKILTAGL•vQQGI•RQRVIPVYQ•NNLEEI•QLIIQAFEAGV•FQESADSFLLMLCLHHAYQGDYKLFLESGAVKYLE 1000 - 83 : i: ]:I I..:..I: :: I. ' ' i . . . . i . . . . -0 • . : .1 .:,--I. :. I ]lii.i]lilliIli::i:: i. ii: il,iil:: 182 I[. I ..Ill.If 101 GHGFRFEVKKRDGVKRLEELLPAV••GKNIKRTLAAM•EEETTEANAGQFLSFASLFLPKLVVGEKACLRKVQRQIQVHAEQGLIQYPTAWQsVGHMMVI 183 FGILR••FILKFvLIHQGVNL•TGHDAYDSII•NSvGQTRF•GLLIVKTVLEFILQKTDSGVTLHPLVRTSK•KNE•ASFKQALSNLARHGEYAPFAR•L 201 FRLMRTNFLI~FLLIHQGMHMVAG~DANDA~I~N~AQARFSGLLIVKT~L~HILQKTERGVRLHPLARTAKVKNE~NSL~AALSSLAKHGEYAPFARLL 283 NLSGINNLEHGLYPQLSAIALG~ATA~GSTLAGVN~GEQYQQLREAAHDAEVKLQRRHEHQEIQAIAEDDEERKILEQFHLQKTEITHSQTLA~L~QKRE 301 NL•GVNNLEHGLFPQLSAIALG•ATAHGSTLAGVNVGEQYQQLREAATEAEKQLQQYAESRELDHLGLDDQEKKILMNFHQKKNEISFQQTNAMVTLRKE 400 383 KLARLAAEIENNI~EDQG.~FKQSQNRvSQSFLNDPTPVEVTVQARPMNRETALPP?VDDKIEHE$TEDSSSSSSFV~LND~FALLNEDEDTLDDSVMIP 480 401 ~LAKLTEAITAASLPKTSG~YDDDDDI~FPGPINDDDNPGHQDDD~TDSQD~TIPDVV~D~DDG~GEYQ~YSENGMNAPDDLVLFDLDEDD-EDTKPVP 499 481 GTTSREFQGIPEPPRQSQDLNN~QGKQEDE~TNRIKKQFLRYQELP~VQEDDE$EYTTD$Q..ESIDQPGSDNEQGVDLPPPPLYAQEKRQD~IQHPAAN 578 500 • ..:: I .... i : ...]::. :: ... :. ..i.. : :i.. .F . . . . . [ : : . : . . ::: i ... :.. NRSTKGGQQKNSQKGQHIEGRQTQSR~IQN~PGPHRTI~HASA~LTDNDRRNEpSGSTSPRMLTPZNEEAD~LDDADDETSSL~LESDDEEQDRDGTSN 579 pQD•FGSIGDVNGDILEPIRSPSSPSAPQEDTRMREAYELSPDFTNDEDNQQ•WPQR\rqTKKGRTFLYPNDLLQTN••ESLITALVEEYQNPVSA•ELQA 600 . . . . :. :.I 1 I : ....... i::.: .li : ,.I I . . i . . :: .::: ..I. : .:.I : i. ..ii RTPTVAPPA~YRDHSE..KKELPQDEQQDQDHTQE~RNQDSDNTQSEHSFEEMYRHILRSQGPFDAVLYYHMMKDEFwFSTSDGKEYTY~D~LEEEYP 679 DW...PDMSFDERRHVAMNL 698 PWLTEKEAMNEENRFVTLDGQQFYWPVMNHKNKFMAILQHHQ r ::[..l::li:llill:::l.illl , 100 DAGYEFDVIKNADATRFLDVSFNEPHYS~LILALKTLE$TE~QRGRIGLFLSFCSLFLPKLwGDp`ASIEKALRQVT~QEQGIvTY~NHWLTTGHMKVI F.:Ill[l:i.lilliiiliilr: ilIil:.li 200 282 [ill~ll.iilill.i:l.lli.il:iilrillii:i 300 MBG N P lllr:iillill:lllilll]llilIllil[]lillilIilillili (b) Amino acids 0 E 200 . . . . . . 400 . i 600 . . . . " i/ I , i / O i 600 400 % - 200 .... I .... E .... .if. i .I:: :: li:l:ili .li .l.il. 382 II i::. ::J , / e., Z :il ,I .: :I-I ...................... :: . . . . . . i 599 678 i ] .i : 697 695 i.:: 739 I ' --0 MBG N P Fig. 5. Computer-aided comparisons of the filovirus nucleotide and amino acid sequences. The Compare, DotPlot and GAP programs of the GCG Sequence Analysis Software Package (Devereux et al., 1984) were used in analyses and graphic output. (a) Matrix comparisons of vcRNA sequences of the 3' ends of the MBG and EBO genomes. Sequences start at the extreme 3' end of the genome and proceed to the ends of the respective NP genes. (b) Matrix comparisons of the predicted amino acid sequences encoded in the MBG and EBO NP genes. For the nucleotide sequence comparison a window setting of 21 and a stringency setting of 15 were used with the Compare program. For the amino acid sequence comparison a window setting of 30 and a stringency setting of 15 were used. (c) Alignment of the predicted amino acid sequences for the NP proteins of MBG (top) and EBO (bottom) using the method of Needleman & Wunsch (1970). For amino acid comparisons a gap weight setting of 3.0 and a length weight setting of 0.5 were used with the GAP program and the NWSGapPep. Cmp comparison file. Similarities between the sequences are identified with a vertical bar ([) for identity, two dots (:) for a comparison value greater than or equal to 0-5 and one dot (.) for a comparison value greater than or equal to 0.1. the difference between sequences. These dendrograms are comparable in that the same viruses segregate into the same positions in the alignments. They differ in that the conserved region alignment shows a greater similarity (shorter horizontal lengths) than the entire NP sequence. As expected, rhabdoviruses, paramyxoviruses and filoviruses segregated into family groups, with the sole exception of respiratory syncytial virus (RSV) which surprisingly split into the filovirus branch of each dendrogram and may indicate a closer relationship to these viruses. In vitro expression of the MBG NP Fig. 8 shows the SDS-PAGE of immunoprecipitated in vitro translation products compared with purified MBG virion proteins. Lanes 1 and 3 contain translation products; their synthesis was primed by a T7 R N A polymerase-generated run-off transcript containing the MBG NP coding region. Comparison of these products with MBG virion proteins (lane 2) and MBG mRNAprimed and uninfected cell RNA-primed translation products (lanes 4 and 5, respectively) demonstrates that the most prominent translation product comigrates with the MBG NP seen in virions and produced from m R N A translation. These data confirm that the ORF in Fig. 3 encodes the MBG NP, and in vitro expression of this region results in the synthesis of an authentic NP. In this in vitro system the NP transcript also primes the synthesis of many smaller proteins (which we have also seen with a similar EBO NP transcript) and may arise from the internal initiation of translation. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 Marburg virus N P gene (b) (a) o 200 , , I 400 . . . . I 600 . . . . 353 0 , , I 400 200 . . . . I 600 . . . . . . . . - 400 400 ~/ Z Z Z . > • ud - 200 - /, "" > 200 _ / ,i-o MBG . . . . . I . . . . I . . . . ! ' ' NP 0 MBG NP (c) MBG E•O SEN HP3 MUM NDV MEA RSV RAB VSV 214 232 247 246 247 245 247 235 223 212 isnsvgqtrF isnsvaqArF itt leknXqI IttieknIql snryyamVgd tstyynlVgd kpriaemIcd ggsrvegl fA airvgt vVtA pirygtiVsr + + sgllIvktvL sqllIvktvL VgnyIrdagL VgnylrdagL IgkyIensgL VdsylrntgL IdtyIveagL glfmnaygag yedcsglvsF FkdcAalatF + I L efILqktdsG dhXLqkterG asFMntikyG asFFntiryG taFFltlkya taFFltlkyG asFIltikfG qvMLrwgvla tgFIkqInlt ghLskvsgls F+ G VtlH.pLVrt VrlH.pLArt VetKmaALtl IetRmaALsl LgtKwspLsl IntKtsALal IetmypALgl ksvKniMLgh AreailyFfb IedlttwVln + + +L skVknEVasF akVknEVnsL snLrpDInkL stLrpDInrL aaFtgELtkL ssLsgDIqkM heFagELstL asVqaEMeqV knFeeEIrrM reVadELcqM + E+ + KqALsnlarh KaALsslakh RsLIdtylsk KaLMelylsk RsLMmlyrdi KqLMrlyrmk esLMnlyqqm veVyeyaqkl fepgqetavp mypgqeidka + ++ GeyAPFArVL GeyAPFArLL GprAPFIcIL GprAPFIcIL GeqArMLaLL GdnAPYMtLL GkpAPYMvnL GgeAgFyhIL hsyFihFrsL dsyMPYMidF G AP++ +L 283 301 317 316 317 315 317 305 293 282 nlSginnLeh nlSgvnnLeh kdpvhgeFap rdpihgeFap eapqimdFap gdSdqmsFap enSiqnkFsa nnpkaslL$1 glSgkspyss qlSqkspyss GIYPqLsaiA GIFPqLsaiA GnYPaLwSyA GnYPaIwSyA GgYPIIfSyA aeYaqLySfA GsYPILwSyA tqFPhFsSvV navghVfnlI vknPaFhfwg LGVAtAhgst LGVAtAhgst MGVAVVqnka MGVAVVqnra MGVqsVLdvq MGMAsVLdkg MGVgVeLens LGnAAgLgim hfVgCyMgqv qiAALLLrst LagvnvgeqY LagvnvgeqY MqqyvtgRtY MqqyvtgRsY MrnytyaRpF tgkyqfaRdF MgglnfgRsY geyrgtpRnq rslnatviaa raknarqpdd GYP +GVA+++ + qqlreAahda qqlreAatea LdmemFiLgq LdidmFqLgq LngyyYqXgv MstsfwrLgv FdpayFrLgq dlydaAkAya CapheMsVlg IeytsLtCas + + + Evklqrrheh Ekqlqqyaes avAkdaeski avApdaeaqm EtArrqqgtv EyAqaqgssi EmVrrsagkv EqLkengvin gyLgeeffgk llLsfavgss E + qelqaiaedd reLdhlgLdd ssALedeLgv sstLedeLgv dnrVaddLgl nedMaaeLkl sstLaseLgi ysVLdltAee gtFerrfFrd adIeqqfyig ++ L Cons. MBG EBO SEN HP3 MUM NDV MEA RSV RAB VSV Cons. s + + S A R + Fig. 6. Computer-aided comparisons of amino acid sequences of selected N N S RNA viruses. (a) Matrix comparison of the MBG and SEN NPs. (b) Matrix comparison of the MBG and VSV NPs. The parameters used in matrix comparisons of amino acids are the same as those used in Fig. 5. (c) Alignment of amino acid sequences from the NP regions of filoviruses and other NNS RNA viruses that correspond to the short region of identity seen between MBG and SEN in (a). Alignment was such that gaps were minimized, with only one space introduced in the MBG and EBO NP sequence [seen as a dot (.)]. Bold capital letters indicate positions where six identical or similar residues occur. Similar amino acids: charged, (D + E), (R + K + H); phenyl group (F + Y); hydrophobic, (A + C + F + I + L + M + V). Hydrophobic amino acids were identified by values assigned in the study by Kyte & Doolittle (1982). A consensus sequence was derived from positions that contained at least six identical amino acids (amino acid letter) or six similar amino acids (+). Asterisks under the consensus line indicate positions where both MBG and EBO contribute identical and/or similar amino acids to the consensus (Cons.). Sources of sequence data: SEN (Morgan et al., 1984); HP3, human parainfluenza type 3 virus (Galinski et al., 1986); MUM, mumps virus (Elango, 1989); NDV, Newcastle disease virus (Ishida et al., 1986); MEA, measles virus (Rozenblatt et al., 1985); RSV (Collins et aL, 1985); RAB, rabies virus (Tordo et at., 1986); VSV (Banerjee et al., 1984). Discussion The results obtained from sequence data and expression studies show that, as for most NNS RNA viruses, the MBG NP gene is positioned at the extreme 3' end of the genome and is preceded by a short leader RNA sequence. The putative leader sequences for MBG and EBO have yet to be demonstrated, but we assume that one is synthesized during transcription as described for related viruses (Giorgi et al., 1983; Kurilla et al., 1985; Vidal & Kolakofsky, 1989). The structure of the MBG NP gene conforms to that of other NNS RNA viruses in that it is delineated by transcriptional signals that act to initiate transcription at the 3' end and terminate transcription [with the concomitant addition of a poly(A) tail] at the 5' end of the gene. The transcriptional signals of the MBG N P gene show a high degree of sequence relatedness with those of the EBO NP gene; the signals differ in three of the 14 bases in the transcriptional start site and are shorter by one U at the end of the poly(A) site (Sanchez et al., 1989). The start sites for the NP genes of MBG and EBO can be related to those of paramyxoviruses (except RSV) in the first and third bases from the 3' end (U and C, respectively) and the presence of the sequence CUU or UUC. The poly(A) sites of the NP genes of filoviruses are very similar to those of viruses in the Paramyxovirus and Morbillivirus genera. The transcriptional signals of the NP genes of filoviruses are distinct from those of these other viruses in that they contain a common sequence, 3' U A A U U , that is positioned at the 5' end of the start site and the 3' end of the poly(A) site. This sequence is also seen in the other genes of MBG and EBO (unpublished data), and its significance in interactions with the viral polymerase is Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 354 A. Sanchez and others (a) (b) 1 2 3 4 5 Conserved region Entire NP RAB VSV SEN HP3 MUM NDV RAB ~ VSV SEN HP3 - - MUM - - NDV MEA MEA MBG MBG EBO RSV I NP..I~ EBO RSV Fig. 7. Dendrograms showing the relatedness of the NP proteins of NNS viruses. Plots were generated using a multiple sequence alignment program (PileUp) that employs a modification of the method of Needleman & Wunsch (1970) to calculate pairwise alignments of sequence clusters (Devereux et al., 1984). The entire amino acid sequences are represented in (a) and the conserved regions of the NP proteins shown in Fig. 6 (c), in (b). A gap weight of 3.0 and a gap length weight of 0.5 were the settings used in the analysis. The PileUpPep. Cmp comparison table file for peptides was used to assess relatedness in pairwise alignments. unknown. The MBG NP ORF is located 56 bases from the 5' end of its mRNA, similar to those described for other NNS RNA viruses, but this is far shorter than the 415 bases present in the EBO NP mRNA. Both the MBG and EBO NP transcripts do, however, have long untranslated regions of 656 and 341 nucleotides [exclusive of added poly(A) tail sequences] at their 3' ends, respectively. The function of these long regions of untranslated sequences is also unknown, but they may influence the level of NP expression or some other viral process. The 3~end 0fthe MBG genome (first 3000 nucleotides) is AU-rich (59.7~), which has an effect on the codon usage in the ORF (56-4~ AU content). An A or a U is present in 43-8~ of the first base positions of codons, compared to 63.9 ~ of the second and 62-8 ~o of the third. This bias towards these bases results in an increase in aspartic acid, glutamic acid, asparagine and glutamine, which are abundant in the MBG NP as well as the EBO NP. The ORF for the EBO NP, however, is not as AUrich and utilizes a somewhat more balanced codon usage. The composition of the predicted amino acid sequence of the MBG NP is very similar to that of the EBO NP. They can be divided into an N-terminal half which is • hydrophobic and a C-terminal half which is decidedly hydrophilic (Fig. 4) and very acidic, a feature that is also characteristic of the NPs of certain paramyxoviruses, such as SEN and human parainfluenza virus type 3 (S~nchez et al., 1986). As seen with the EBO NP, the MBG NP has three cysteine residues that are localized in Fig. 8. Comparison of in vitro expressed and authentic MBG NP. Fluorogram of [35S]methionine-labelled proteins subjected to SDSPAGE. Lanes 1, 3, 4 and 5 contain proteins immunoprecipitated from in vitro translation reactions using a human anti-MBG serum. Synthesis of translation products shown in lanes 1 and 3 was primed with a runoff transcript that contains the MBG NP coding region, in lane 4 with a crude mRNA preparation of MBG-infected Vero E6 cell RNA (positive control) and in lane 5 with a preparation of uninfected Veto E6 cell RNA (negative control). Lane 2 is a marker lane with purified MBG virion proteins and the position to which the virion NP migrates is identified at the left edge of the figure. the N-terminal third and concentrations of proline and acidic residues in the C-terminal half of the molecule. The MBG NP has a predicted M~ of 77.9K, much lower than the value calculated from SDS-PAGE, 94 K (Kiley et al., 1988). This difference is not unusual, since similar observations has been reported for the NP of EBO and other related viruses (Sanchez et al., 1989; Galinski et al., 1986; S~inchez et al., 1986; Gallione et al., 1981). The larger mass of filovirus NPs, compared to those of other NNS RNA viruses, may be due to additional sequences at the C terminus. Computer matrix comparisons of the nucleic acid sequences of the MBG and EBO NP genes (including the 3' leader sequence) indicate that only in the central part of the coding regions is there any significant homology. This finding is in contrast to the homology seen in the predicted amino acid sequences, which show strong identity or similarity in the 400 residues at the N terminus, which is achieved through different codon usage. The greatest degree of identity in these amino acid sequences is located in the central part of the proteins (MBG residues 297 to 330) and corresponds to the region in which the nucleic acid sequences are most similar (MBG bases 990 to 1090). The varying lengths of the untranslated regions of the NP genes of MBG and EBO, together with their different codon usages in Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 Marburg virus N P gene generating very similar NP molecules, suggests that these agents may have diverged at some point in the distant past. The lack of amino acid sequence homology in the Cterminal half of the NPs of MBG and EBO is analogous to that seen with certain closely related paramyxoviruses, in which the last 100 or more residues are very divergent (Rozenblatt et al., 1985; Galinski et al., 1986; Jambou et al., 1986; Matsuoka & Ray, 1991 ; Sakai et al., 1987; Lyn et al., 1991). The reason for the divergence of the C termini of these viruses has not been determined, but it has been suggested that this region is exposed to the environment and may interact with the viral matrix protein in the assembly process (Heggeness et al., 1981; Rima, 1989; Kondo et al., 1990; Barr et al., 1991). If the size of filovirus NPs relative to the NPs of other NNS R N A viruses is due to an increase in length at the C terminus, then one might speculate that interactions of this region with the matrix protein and/or other structural proteins during the budding process could result in the characteristic bacilliform shape of the filovirus virion. Alternatively, this region might interact with a putative second NP found in both EBO and MBG (Kiley et al., 1988; Elliott et al., 1985). The similarities in hydropathy and primary amino acid sequence between the MBG and EBO NP proteins are outwardly striking, but despite these likenesses the serological evidence indicates that these agents are antigenically unrelated. Localization of major antigenic epitopes in the C-terminal half could explain this phenomenon if cross-reactivity of antibodies to this region is dependent on a high degree of conservation (lacking between MBG and EBO). Immunodominance of the C terminus in stimulating antibody production has been described for the NPs of paramyxoviruses (Gill et al., 1988; Tanabayashi et al., 1990) and a similar condition could exist with filovirus NPs. Results of matrix comparisons of the NP amino acid sequences of filoviruses and other NNS R N A viruses demonstrated that a small region in the central part of these NPs showed some conservation between MBG and SEN (Fig. 6a). Alignment of this N P region of MBG and EBO to that of other NNS R N A viruses identified certain sites that are conserved in these proteins, particularly between filoviruses and paramyxoviruses. Two areas within this region showing the greatest concentration of similar sequences are underlined in the consensus line of Fig. 6(c). This region has also been shown by others to contain sequences that are highly conserved within the Paramyxoviridae (Galinski et al., 1986; S~nchez et al., 1986; Elango, 1989; Kondo et al., 1990; Lyn et al., 1991 ; Morgan, 1991) and are contained in the alignment in Fig. 6(c). Recently, Barr et al. (1991) reported that pneumoviruses, paramyxoviruses, rhabdo- 355 viruses and EBO contain three separate conserved regions, the largest of which is 50 bases and is contained within the alignment shown in Fig. 6(c) (box C; MBG sequence 255 to 304). The other two boxes are both 13 residues in length, one of which is contained in the Nterminal end of the alignment in Fig. 6(c) (box B; MBG sequence 213 to 225) and the other is positioned in the Nterminal third of the NP (box A; MBG sequence 132 to 144). It should be noted that box A corresponds to the beginning of the region of strong similarity seen between MBG and EBO and that box C overlaps the longest run of identity that ends this region. The authors stated that this region appears to begin with sequences that contain a high degree of a-helices and terminates with a fl-sheet and reverse turn suggesting a similar conformation. They hypothesized that the first 350 to 400 residues of the NP proteins of these viruses may have retained a similar structure with divergent sequences, and that the regions of common structure (homology) may represent areas involved in R N A binding and interactions with other proteins in the nucleocapsid. Lyn et al. (1991) arrived at a similar conclusion following cross-reaction studies using anti-SEN NP monoclonal antibodies to detect epitopes on the human parainfluenza type 1 virus NP. Our alignment of the amino acid sequences of the MBG and EBO NP proteins (Fig. 5) supports the observations of maintained functional domains in the first 400 Nterminal residues. We have noted that these similarities in amino acid composition occur within an extremely hydrophobic domain, as seen in the hydropathy plots in Fig. 4. The bar above the plots identifies the conserved region aligned in Fig. 6 (¢) and the two arrows point to two hydrophobic peaks that contain the two most related sites. The peak nearest to the C terminus also corresponds to the sequence previously identified by Elango (1989) as a region highly conserved in paramyxoviruses. Galinski et al. (1986) also observed that conserved regions, identified in alignments of paramyxovirus NP sequences, resided in hydrophobic domains. N-terminal to the conserved region in the MBG, EBO and SEN hydropathy plots in Fig. 4 there is a prominent hydrophobic peak, identified by an asterisk, positioned approximately 80 amino acids from this region. Alignments of the sequences from these regions (MBG or EBO with SEN) failed to show any significant relationship, but they could represent a region whose structure/function is less sequence-dependent. The significance of these hydrophobic sequences in the biology of the NP proteins is unclear, but a role in either folding of the NP and/or binding to the vRNA has been postulated (Morgan, 1991). If these areas represent functional domains, then their characterization would contribute greatly to the basic understanding of the biology of the ribonucleoprotein complex of NNS R N A viruses. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 356 A. Sanchez and others Multiple amino acid sequence alignments of the NPs of representative members of the three families of NNS R N A viruses (seen in the form of dendrograms in Fig. 7) revealed three things. First, the alignment of the NP sequences, save that of RSV, segregated them into their respective families. Second, filoviruses appear more closely related to paramyxoviruses than to rhabdoviruses. And finally, the segregation of RSV with filoviruses may indicate that a closer evolutionary relationship exists between these agents. Using a different alignment procedure, Pringle (1991) generated a similar dendrogram and noted the uniqueness of pneumoviruses. In a paper that compared L amino acid sequences, Stec et al. (1991) concluded that RSV represents a distinct lineage from other paramyxoviruses and showed a dendrogram that is similar to our results. Comparisons of other filovirus genes and gene products should define further the relationship of filoviruses to other NNS R N A viruses, particularly in polymerase amino acid sequences which have been found to be highly conserved (Blumberg et al., 1988; Galinski et al., 1988; Tordo et al., 1988, Barik et al., 1990; Stec et al., 1991) and could provide a better means of gauging genetic relatedness. In conclusion, sequence analysis of the 3' end of the MBG genome, including the entire NP gene, has shown it to be organized and structured in a manner that is consistent with those of EBO and other NNS RNA viruses. The similarity of filovirus NP genes and gene products to those of paramyxoviruses implies a closer biological and phylogenetic relationship to these agents than to rhabdoviruses. DEVEREUX,J., HAEBERLI,P. & SMITHIES,O. (1984). A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Research 12, 387-395. ELANOO, N. (1989). The mumps virus nucleocapsid mRNA sequence and homology among the paramyxoviridae proteins. Virus Research 12, 77-86. ELLIOT'r,L. H., KILEY, M. P. & McCORMICK,J. B. (1985). Descriptive analysis of Ebola virus proteins. Virology 147, 169-176. FELDMANN, H., WILL, C., SCHIKORE,M., SLENCZKA,W. & KLENK, H.-D. (1991). Glycosylation and oligomerization of the spike protein of Marburg virus. Virology 182, 353-356. GALINSKI, M. S., MINK, T. A., LAMBERT,D. M., WECHSLER,S. L. & PONS, M. W. (1986). Molecular cloning and sequence analysis of the human parainfluenza 3 virus RNA encoding the nucleocapsid protein. Virology 149, 139-151. GALINSKI, M. S., MINK, i . A. & PONS, i . W. (1988). Molecular cloning and sequence analysis of the human parainfluenza 3 virus gene encoding the L protein. Virology 165, 499-510. GALLIONE, C. J., GREEN, J. R., IVERSON,L. E. & ROSE, J. K. (1981). Nucleotide sequences of the mRNA's encoding the vesicular stomatitis virus N and NS proteins. Journal of Virology 39, 529-535. GEAR, J. S. S., CASSEL,G. A., GEAR,A. J., TRAPPLER,B., CLAUSEN,L., MEYERS,A. M., KEW, M. C., BOTHWELL,T. H., SHER, R., MILLER, G. B., SCHNEIDER, J., KOORNHOFF, H. J., GOMPERTS, E. D., ISSACSON,i . & GEAR, J. H. S. (1975). Outbreak of Marburg virus disease in Johannesburg. British Medical Journal 4, 489-493. GILL, D. S., TAKAI, S., PORTNER, A. & KINGSBURY, D. W. (I988). Mapping of antigenic domains of Sendai virus nucleocapsid protein expressed in Escherichia coli. Journal of Virology 62, 4805-4808. GIORGI, C., BLUMBERG, B. & KOLAKOFSKY, D. (1983). Sequence determination of the (+) leader RNA regions of the vesicular stomatitis virus Chandipura, Cocal, and Piry serotype genomes. Journal of Virology 46, 125-130. GUBLER, O. & HOFFMAN,B. J. (1983). A simple and very efficient method for generating cDNA libraries. Gene 25, 263-269. GUPTA, K. C. & KINGSBURY,D. W. (1982). Conserved polyadenylation signals in two negative-strand RNA virus families. Virology 120, 518-523. HEGGENESS, M. H., SeHEID, A. & CHOPVIN, P. W. (1981). The relationship of conformational changes in the Sendai virus nucleocapsid to proteolytic cleavage of the NP polypeptide. Virology 114, 555-562. ISHIDA, N., TAIRA, H., OMATA, T., MIZUMOTO, K., HATTORI, S., IWASAKI,K. & KAWAKITA,M. (1986). Sequence of 2617 nucleotides from the 3' end of Newcastle disease virus genome RNA and the predicted amino acid sequence of viral NP protein. Nucleic Acids Research 14, 6551-6564. References JAHRLING,R. B., GEISBERT,T. W., DALGARD,D. W., JOHNSON,E. D., KSIAZEK, T. G., HALL, W. C. & PETERS, C. J. (1990). Preliminary BANERJEE, A. K., RHODES, D. P. & GILL, S. S. (1984). Complete report: isolation of Ebola virus from monkeys imported to USA. nucleotide sequence of the mRNA coding for the N protein of Lancet (i) or (ii), 502-505. vesicular stomatitis virus (New Jersey serotype). Virology 137, 432JAMBOU,R. C., ELANGO,N., VENKATESAN,S. & COLLINS,P. L. (1986). 438. Complete sequence of the major nucleocapsid protein gene of human BARIK, S., RUD, E. W., LUK, D., BANERJEE,A. K. & KANG, C. Y. parainfluenza type 3 virus: comparison with other negative strand (1990). Nucleotide sequence analysis of the L gene of vesicular viruses. Journal of General Virology 67, 2543-2548. stomatitis virus (New Jersey serotype): identification of conserved KILEY, M. P., WILUSZ, J., McCORMICK,J. B. & KEENE, J. D. (1986). domains in L proteins of nonsegmented negative-strand RNA Conservation of the 3' terminal nucleotide sequences of Ebola and viruses. Virology 175, 332-337. Marburg virus. Virology 149, 251-254. BARR, J., CHAMBERS, P., PRINGLE, C. R. & EASTON, A. J. (1991). KILEY, M. P., Cox, N. J., ELLIOTT,L. H., SANCHEZ,A., DEFRIES, R., Sequence of the major nucleocapsid protein gene of pneumonia virus BUCHMEIER, M. J., RICHMAN,D. D. & McCORMICK, J. B. (1988). of mice: sequence comparisons suggest structural homology between Physicochemical properties of Marburg virus: evidence for three nucleocapsid proteins of pneumoviruses, paramyxoviruses, rhabdodistinct virus strains and their relationship to Ebola virus. Journal of viruses and filoviruses. Journal of General Virology 72, 677-685. General Virology 69, 1957-1967. BLUMBERG, B. M., CROWLEY, J. C., SILVERMAN,J. I., MENONNA,J., KONDO, K., BANDO,H., KAWANO,M., TSURUDOME,M., KOMADA,H., COOK, S. D. & DOWLING, P. C. (1988). Measles virus L protein N ISHIO,M. & ITO, Y. (1990). Sequencing analysis and comparison of evidences elements of ancestral RNA polymerase. Virology164, 487parainfluenza virus type 4A and 4B NP protein genes. Virology 174, 497. 1-8. CENTERS FOR DISEASE CONTROL (1989). Ebola virus infection in KOZAK, M. (1986). Point mutations define a sequence flanking the imported primates - Virginia 1989. Morbidity and Mortality Weekly AUG initiator codon that modulates translation by eukaryotic Report 38, 831-838. ribosomes. Cell 44, 283-292. COLLINS,P. L., ANDERSON,K., LANGER,S. J. & WERTZ, G. W. (1985). KURILLA,i . G., STONE,H. O. & KEENE, J. D. (1985). RNA sequence Correct sequence for the major nucleocapsid protein mRNA of and transcriptional properties of the 3' end of the Newcastle disease respiratory syncytial virus. Virology 146, 69-77. virus genome. Virology 145, 203-212. Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55 Marburg virus N P gene KYTE, J. & DOOLITrLE, R. F. (1982). A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology 157, 105-132. LYN, D., GILL, D. S., SCROGGS, R. A. & PORTNER, A. (1991). The nucleoproteins of human parainfluenza virus type 1 and Sendal virus share amino acid sequences and antigenic and structural determinants. Journal of General Virology 72, 983-987. MARTINI, G. & SIEGERT, R. (editors) (1971). Marburg Virus Disease. Wien & New York: Springer-Verlag. MATSUOKA,Y. & RAY, R. (1991). Sequence analysis and expression of the human parainfluenza type 1 virus nucleoprotein gene. Virology 181, 403-407. MAXAM,A. M. & GILBERT,W. (1980). Sequencing end-labeled DNA with base-specific chemical cleavages. Methods in Enzymology 65, 499-560. MORGAN, E. i . (1991). Evolutionary relationships of paramyxovirus nucleocapsid-associated proteins. In The Paramyxoviruses, pp. 163179. Edited by D. W. Kingsbury. New York & London: Plenum Press. MORGAN, E. M., RE, G. G. & KINGSBURY,D. W. (1984). Complete sequence of the Sendal virus NP gene from a cloned insert. Virology 135, 279-287. NEEDLE~a~N, S. B. & WUNSCH, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443-453. PRINGLE,C. R. ( 1991). The order Mononegavirales. Archives of Virology 117, 137-140. RIC~OSON, J. H. & BARKLEY,W. E. (1988). Biosafety in Microbiological and Biomedical Laboratories. USPH, CDC. HHS Publication no. 88-8395. RIMA, B. K. (1989). Comparison of amino acid sequences of the major structural proteins of the paramyxo- and morbilliviruses. In Genetics and Pathogenicity of Negative-strand Viruses, pp. 254-263. Edited by B. W. J. Malay & D. Kolakofsky. Amsterdam: Elsevier. ROSEN, J. M., Woo, S. L. C., HOLDER, J. W., MEANS, A. T. & O'MALLEY, B. (1975). Preparation and preliminary characterization of purified ovalbumin messenger RNA from the hen oviduct. Biochemistry 14, 69-78. ROZENBLATT,S., EISENBERG,O., BEN-LEVY,R., LAVIE,V. & BELLINI, W. J. (1985). Sequence homology within the morbilliviruses. Journal of Virology 53, 684-690. SAKAI, Y., Suzu, S., SHIODA, T. & SHIBUTA, H. (1987). Nucleotide sequence of the bovine parainfluenza 3 virus genome: its 3' end and the genes of NP, P, C and M proteins. Nucleic Acids Research 15, 2927-2944. 357 SANCHEZ, A. & KILEY, M. P. (1987). Identification and analysis of Ebola virus messenger RNA. Virology 157, 414-420. SANCHEZ,A., BANERJEE,A. K., FURUICHI,Y. & RICHARDSON,M. A. (1986). Conserved structures among the nucleocapsid proteins of the Paramyxoviridae: complete nucleotide sequence of human parainfluenza virus type 3 NP mRNA. Virology 152, 171-180. SANCHEZ, A., KILEY, i . P., HOLLOWAY,B. P., McCORMICK,J. B. & AUPERIN, D. D. (1989). The nucleoprotein gene of Ebola virus: cloning, sequencing, and in vitro expression. Virology 170, 81-91. SANGER,R., NICKLEN,S. & COULSON,A. R. (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 74, 5463-5467. SMITH, D. H., JOHNSON, B. K., ISAAC'SON,M., SWANAPOEL, R., JOHNSON, K. M., KILEY, M., BAGSHAWE, A., SIONGOK, T. & KERUGA, W. K. (1982). Marburg-virus disease in Kenya. Lancet i, 816-820. STEC, D. S., HILL, i . G. & COLLINS,P. L. (1991). Sequence analysis of the polymerase L gene of human respiratory syncytial virus and predicted phylogeny of nonsegmented negative-strand viruses. Virology 183, 273-287. TANABAYASHI, K., TAKEUCHI, K., HISHIYAMA, i . , YAMADA, A., TSURUDOME,M., ITO, Y. & SUGIURA,A. (1990). Nucleotide sequence of the leader and nucleocapsid protein gene of mumps virus and epitope mapping with the in vitro expressed nucleocapsid protein. Virology 177, 124-130. TORDO, N., POCH, O., ERMINE, A. & KEITH, G. (1986). Primary structure of leader RNA and nucleoprotein genes of the rabies genome: segmented homology with VSV. Nucleic Acids Research 14, 2671-2683. TORDO, N., POCH, O., ERMINE,A., KEITH, G. & ROUGEON,F. (1988). Completion of the rabies virus genome sequence determination: highly conserved domains among the L (polymerase) proteins of unsegmented negative-strand RNA viruses. Virology 165, 565-576. VIDAL, S. & KOLAKOFSKY,D. (1989). Modified model for the switch from Sendal virus transcription to replication. Journalof Virology 63, 1951-1958. ZIMMERN,n. & KAESBERG,P. (1978). T-Terminal nucleotide sequence of encephalomyocarditis virus RNA determined by reverse transcriptase and chain-terminating inhibitors. Proceedings of the National Academy of Sciences, U.S.A. 75, 4275-4261. (Received 10 July 1991; Accepted 2 October 1991) Downloaded from www.microbiologyresearch.org by IP: 88.99.165.207 On: Sun, 30 Apr 2017 04:58:55