Download The complete nucleotide sequence of the RNA coding for the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene wikipedia , lookup

RNA wikipedia , lookup

Protein wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Point mutation wikipedia , lookup

Epitranscriptome wikipedia , lookup

Metabolism wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Gene expression wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Protein structure prediction wikipedia , lookup

Proteolysis wikipedia , lookup

Homology modeling wikipedia , lookup

Biosynthesis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genomic library wikipedia , lookup

Biochemistry wikipedia , lookup

RNA-Seq wikipedia , lookup

Polyadenylation wikipedia , lookup

Genetic code wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Transcript
Volume 12 Number 5 1984
Nucleic Acids Research
The complete nucteotide sequence of the RNA coding for the primary translation product of foot
and mouth disease virus
A.R.CarroU*, D.J.Rowlands* and B.E.Clarke*
Animal Virus Research Institute, Ash Road, Pirbright, Woking, Surrey, UK
Received 10 January 1984; Revised and Accepted 16 February 1984
ABSTRACT
The complete nucelotide sequence of the coding region of foot and mouth
disease virus RNA (strain A . Q 6 1 ) is presented. The sequence extends from the
primary initiation site, approximately 1200 nucleotide from the 5' end of the
genome, in an open translational reading frame of 6,999 nucleotides to a
termination codon 93 nucleotides from the 3' terminal poly (A). Available amino
acid sequence data correlates with that predicted from the nucleotide sequence.
The amino acid sequence around cleavage sites in the polyproteln shows no
consistency, although a number of the virus-coded protease cleavage sites are
between glutamate and glycine residues.
INTRODUCTION
Foot and mouth disease (FMD) is a highly contagious viral disease of clovenhooved animals and is of paramount economic importance. The virus (genus:
aphthovirus; family: Picornaviridae) has a single stranded positive sense RNA
genome of approximately 8,500 nucleotides. The genome is 3' polyadenylated, can
serve directly as a messenger and contains a poly (C) tract (100-200 nucleotides
long) located approximately 400 nucleotides from the 5' end (1).
Unlike most eukaryotlc mRNAs, FMDV RNA is not capped at its 51 end but
has a small viral-coded protein (VPg) covalently attached to the terminal 5' uridine
residue (2, 3 i The VPg protein is not required for translation and, by correlation
with poliovirus is probably involved in replication (A, 5). The RNA of FMDV, like
other picornaviruses, appears to have a long 5' untranslated leader sequence.
Removal of the poly C tract of FMDV and all nucleotides to its 5' side has no
effect on the spectrum of proteins generated by in vitro translation. This indicates
that the site for initiation of translation is located to the 3' side of the poly (C)
tract (6X
The initial translational product of FMDV RNA would be a polyprotein of
about 250,000 daltons. However, this protein Is not observed as it is nascently
processed into the viral structural and non-structural proteins as shown in Figure L.
We describe here the complete nucleotide sequence of the region coding for
this poly protein from one strain of FMDV.
© IRL Press Limited, Oxford, England.
2461
Nucleic Acids Research
MATERIALS AND METHODS
Recombinant Plasmids
The construction of recombinant plasmids has been described previously (7).
Briefly double stranded cDNA was synthesised using RNA from FMDV strain A . n
61 and inserted by G:C tailing into the Pst I site of pAT 153. Two of the resulting
recombinanta (pFA76 and pFA206) contained inserts corresponding to =85% of the
FMDV genome. Detailed restriction maps of these recombinants were generated
by standard procedures. pFA206 was subcloned into two more manageable size
clones pFA206a or pFA2O60.
Initially pFA206 was digested with EcoRI,
recirculari8ed using T4 DNA ligase and used to transform E. coli MC1061 by
standard procedures (8, 9). This resulted in the clone pFA206a representing those
sequences to the 3' side of the EcoRI site in pFA206 (see Figure IX pFA206B was
generated by digestion of pFA206 with EcoRI and PstI, gel purification of the
required band and ligation into EcoRI/PstI digested pAT 153.
DNA sequencing
(10).
All DNA sequencing was carried out by the method of Maxam and Gilbert
Recombinant DNA was sequenced using uniquely 51 or 3' 32P labelled
restriction fragments as described previously (11). Primer extension sequencing
32
was carried out as described by Rowlands et al (12). Briefly, a 5'
P-labelled
primer (Cell Tech. Ltd., UK) was used to direct the synthesis of cONA using
reverse transcriptase and viral RNA as a template. The resulting 5' labelled cDNA
was size fractionated and sequenced by the method of Maxam and Gilbert (10).
Analysis of Sequence Data
DNA sequence was analysed using an Apple II microcomputer as described
previously (13).
RESULTS AND DISCUSSION
Sequencing Strategy
The construction of cONA clones from FMDV RNA (serotype A1Q 61) has
been described elsewhere (7). These clones represented in total about 85% of the
viral genome (Figure IX The nucleotide sequence of the region coding for the
structural protein (VPs 1-4) has been previously reported (11). As shown In Figure 1
a large clone (pFA2O6) having a cONA Insert of 5.4Kb represented the major part
of the genome coding for the non-structural proteins. To simplify sequence
analysis this insert was sub-cloned into two halves (pFA206B and pFA2O6a) as
described In methods. Detailed restriction analysis of these clones (see Figure 1)
allowed a sequencing strategy to be carried out so that most regions of the genome
were sequenced in both strands. Those regions which were not sequenced in both
strands are clearly indicated in Figure 1 and these regions were determined from
2462
Nucleic Acids Research
0
FMDV
RNA
1
2
3
SCALE Kb
4
5
7
6
8
VPg pdy(C)
E
potylA)
^
Primary
Product!
L
p20a
P1
P8S
P3
plOO
P2
PE2
Ctowagt
p2(Wp16 VP4 VP2 VP3
VP1
p12
p34
VPgi p20b
pFA76
XX ML
I i
pFA206
Saqutncing
Stringy
| f
EQR
i i i
XTTPDHHPBVFP
i i i
I
I
i
i
11
i
'
i
primtf
FIGURE 1 Organisation of the FMDV genome. The positions of the gene products
is based on previously published work (1). The naming of the polypeptides has been
done using both existing names and the recommendations agreed at the 3rd
European Study Group on moleculear biology of picornaviruses, Urbino, Italy,
September, 1983. The new nomenclature is indicated above the corresponding
polypeptides and the old nomenclature is indicated below. The alignment of two
cDNA clones with respect to the viral genome la shown. The Eco R l site was used
to sub-clone pFA206 into a and B as indicated. The sequence of 2820 nucleotides
from the 5' end of the clone pFA76 has been reported previously (7X The sequence
strategy is illustrated by the bars and arrows indicating the restriction sites used
and the extent of the sequence obtained. Primer extension sequencing was used to
obtain sequence to the 5' side of pFA76 and the location of the primer and
sequence obtained is shown. The abbreviations used for restriction enzymes are: B,
Bam HI; D, Hind III: E, Dde I; F, Hin FI; G, Bgl II; H, Hpa II; L, Bgl I; M, Sma I; P,
Pst I; R, Eco RI; S, Sal I; T, Taq I; V, Pvu II; X, Xho I.
several independent sequencing experiments.
Previous mapping studies using defined T, oligonucleotide probes had
indicated that the clone pFA206 did not include sequences extending to the poly (A)
at the 3' end of the genome (7). The sequence data show however that this clone
does in fact contain the whole of the 31 end of the genome including 33 bases of the
poly (A) tract.
None of the cDNA clones hybridised with T. oligonucleotides derived from
the 5' end of the genome suggesting that this region was not represented in our
clone banks. In order to obtain 5' end sequence beyond the region represented in
the clones a primer extension method was employed. A synthetic oligonucleotide
(Cell Tech Ltd., UK) whose sequence was derived from the 51 proximal region of
pFA76 (see Figure 1) was used to prime cDNA transcripts which were sequenced as
described in methods. Approximately 450 bases were sequenced using this primer.
2463
Nucleic Acids Research
35 £5-
3 >,
is
35
& 35 §3
§3 85 m
SI
35
Sfc
OH
=3tfl
o c
OO
o i_
UH
O
o >,
83
35
Is
§3 SI
83
O «}
P3
§3 fe §n
55 35
s3 a£
mm
53 83 u h
OO
o L
O. *
aCcfl
<>»
QH
Oo
O 3
U Q.
3.3 33
3fc
35
o c
IM
35 3.3
g35 °H
<
1 35
35
3
o*e
O-5!
=i^
S? %Z
5C
5l=
^ 5 e> rnD u
orD«
Ua
o t>
•< *
g£
5i
83
33
Ux
1
3^_J
OC
OttT
«
35
U C
S 3 5^3
S3
C
55 55
O c>
(J O
•—OO
^4*
O DO
"33
Hi
li
ACA
Thr
§5
O >,
s3
33 S3
35 83
35 OO OO
*!3
O«
55
00
§5 ^£
S5
<n
J<
as
33
«5
00
u x no
-Si
O C
00
mun
!S3
U 3
= -1
OO
O °-
S> 3
*Z
0 u
55 ^3
83
33
oct
^
83
00
00
35
8"
83
OO
«"
oV
SS3 SS5
5f3 o^f
a& H
i
o>
o3
Q3
y a.
30
33
35 l a
isi
n? t- ino
§3
§1 S3 81 §3 $5 gf
o a.
o>-
o^i
=>>v
Oo
O g>
u<
^l
"g-3 ~B?
as S5
Oi
U 83
Ss 35
31.
o c
OO
O l_
o •
Or-J
If 33
t^U
gl
o n
B6 1
o « *~Q n
o e
^ §5
3S Is
IJ-H
§5 33
O
^-1
84
o>
81
o£
II 35 35
2464
o>
Si?
8P 335 3-**«
a 5^ U
5
00
od:
3_
oV
Dt-
O-H
Oo
§3
00
31 si
35
15 £
O>,
o-S
^ ^
oa
o<
O tl
DC
I3C?5
35 ye
as- 35
O Q.
OrH
35 §3 35
35
00
§£*• Or-,
St O3^
35
t
oo
2 n
O DO
84
O 3
•< 3
8i
O C
O C
So
§5 Si
OjOo.
35 3£
55 §3 SS3
000
oo
JO.
3.3
CTCKO
S3 13 33 II ^ f»5 >> f^O -^ y ^ i o L,
on
§ta. ye
eg n
o c
o «»
3£ 33
35 33 §£
o Q.
on
Ju
84
gj
33 55
O C
83
if
on
oc
3S 33
§3 33
Nucleic Acids Research
it 1*3 ps m
£5
§3 ^3 §5
§3 ^ §3 83
15 33
i!
3S 83
*i-t
(JO
85
a
t
o3
o3
<->o.
U L .
oo
S-J
o^
=>t-
o£
<££
£$Z £8£ £S£ £33
3*P ^ 3 3 £j"<P T«F Jpoa
"u
CM
<M
°J
"J
gs a.2 85 a "
5E O I O 3 5 5
afc
3H
3S
" o ,
o o
3.3
8t
»3 s i
84 SfT
81 3a &
00 c
£3hH
if
^^3
f'T'Jj
SS 83
3^ 8?
2 i & op«
o-< z
"33 SS^
ss| ^ 3
33
as 35
s-a
83
84 §3 I 33
ga aJi 35 Sa 8 t3o Si?
i^ 33
1$ 85
"•3
SS
si
U'H
<-> 3
On
UC
3j
Go
•Kin
*I-J
SS SS
n
_> CD
BS 83
:^i
=3 CJ
O E
II
^
5l
00.
oc
(_>M
O1*
^-*s
Ui
ac
^n
2*
133
"i-H
00
C4,C
*P
t
3 CD
U L,
QI-.
O O
O3
O 0
5S 8-fe
o^^nj
-'^i
u<
Oo <P
u^t
£c3£ gap
3 4)
33
3 1)
o a.
IIL
* ; *)
(j 3
^<—1
8fe §S
hj
ry
< n
31 § 3 . 1 !
83 8i=
o n
Or-4
=J5 * • *
t_)ra
UL.
O t_
3£ aji
P3
< 3
3i 53
3^ at
S3 3^
(_) C
Q3
iS 3a
5S 3i
O 3
C7^3O
°§*> ^ o a ™=,« ^ y -
as
3n
O-3 35
jc
•< aj
31—
§
Is 3S SS
g3
SI S3 81 §3
§3 as 3S 83
3>3
•< L
n a
3co
ua.
^F-
<-\
Q3
L) L.
(J 3
a
>
%-<
00
no
Cn
3«
O_j
r-.O
J
UQ.
3L
OC
3D.
as i
Stv
*<>*£ 3£
s ^ 35
< » 35
<->«
^^ ^
13
i s s Ri^a
a& as "as "^«
is as 35
»
SS
B5
§1 S3
33 53
Ojc
4 ^ P
33i^
^
SI 33
§4§5 §1
3<-t
^ - ^
<»H
UX
S5
35 13 33 Si
<<|-i
O£
U3
y a OH o 3
O"C
' ^
^ff
Sid
00
oi!
5*B|
IDID
3u
PO
"5 «
(J >.
3JT 8 3
^ ^
UP
83 33 83 3i
35 8£
s pa,~
§3 3a 2e 1 ! 35
Ss S^ §S SS as§1 8£
S3 3i 8E 33 as33 §3
m
85 S3
as §3
13 SS S3 33 8S
3S as §1 SS 33
5_S
2465
Nucleic Acids Research
31
2466
5115
5130
5415
5160
5175
5190
5505
5520
CUC LRAJ GCC UAC AAJ GCC GCC ACC AGC G&J CCC UAC X U CCA CCA CCC CUU CUU GCC AAC CAC CCC CCU CAC ACA UUC AUC CUC CCC ACU CAC UCU CCA CCU CGC AAU CCA GUU CCA DAC
Leu Phe A l l T y r Lys A l l A l l Thr Arg A l l Gly T y r Cys C l y C l y A l l V i l Leu Ala Lys Asp Gly A l l Aap Thr Phe I^e V i l Cly Thr H i s Ser A l l Cly C l y Asn C l y V i l C l y Tyr
UGC UCA UGC CUll O K AGC UCC A X CUU 1 ) 8 AAC A X AAC C O ) 5 ^ GUC CAC CCU CAA5^C? CAC CAC CAC GGG5u8c AUU GUU GAU A C C ^ S GAU G X CAA G A Q 5 ^ CUC CAC GUC AUC5?^?
Cys Ser Cys V i l Ser Arg Ser I € T Leu G i n Lys HE? Lys A l l H i s V a l top Pro C l u Pro H i s H i s C l u C l y Leu H e v i T top Thr Arg Asp V i l G l u G l u Arg V i l H i s V i l MET Arg
AAA ACA AAG CUU ^ U CCC AC: GUI) CCA UAC CCC GUG UUC AACCCU GAC UUU CCC CCU CCC GCC UUG UCA AAC AAC CAC COG CCC C X AAC CAC GGA CUU CUU Cut GAC G AC GUC AUU UUC
Lys T h r Lys L . o A l a Pro Thr V i l A l l T y r Gly V i l Phe ton P r o ' C l u Phe Gly Pro A l l A l l Leu Ser Asn Lys top Pro Arg Leu Asn G l u C l y V i l V i l Leu top top V i l H e Phe
U K AAA CAC AAA S i CAC CCA AAC AUC At? CAA CAC GAC AAA5Gffi CUC UUC CCC C C C l S ? CCC CCU CAC UAC5Gffi UCA OGC CUC CAC 5 !?? GUC C X CCU A O C 5 ^ AAU GCC CCA UUG 5 *!?
S«r Lys H i s Lys C l y top A l a Lys MET T h r G l u G l u top Lys A l l Leu Phe Arg Arg Cys Ala Ala top T y r A l l Ser Arg Leu H i s Ser v i l Leu Gly Thr Ala ton A l l Pro Leu S e r
5895
5910
5925
5910
5955
5970
5985
6QOQ
AUC UAC GAG GCA AOC AAC GGC GUU GAU GGA CUC GAC GCC A X GAC CCG CAC ACU GCA CCU GCC CUC CCC UGG CCC QJC CAG GCA AAA CGC OGC CCU GCG CUC AUC GAC UUC GAG AAC CCC
l i e T y r G l u A l l H e Lys G l y V a l top C l y Leu top A l a HET G l u Pro top Thr Ala Pro C l y Leu Pro T r p A l l Leu G i n Gly Lys Arg Arg G l y A l l Leu H e top Phe G l u ton Gly
6015
6030
6015
6060
6075
6090
6105
6120
ACC GUC GCA CCC CAA GUU GAC GCU GCC UUG AAC CUC A X GAG AAA AGA GAA UAC AAC UUU GCU X U CAC ACC UUC C X AAC CAC CAG AUU CGA CCC A X GAG AAA CUA CCC GCC GGC AAG
Thr V i l Gly Pro C l u V a l G l u Ala Ala Leu Lys Leu HET G l u Lys Arg G l u T y r Lys Phe Ala Cys G i n Thr Phe Leu Lys top G l u l i e Arg Pro HET G l u Lys V i l Arg A l a C l y Lys
6135
6150
6165
6180
6195
6210
6225
6240
ACU OGC AUC CUC CAD GUU U X CCU GUU GAA CAC AUU CUU UAC ACC AAG A X A X AUU GGC AGC UUC X U GCA CAA A X CAC UCA AAC AAC GGA CCA CAA AUU CCC UQJ GOG GUC GCU UGC
Thr Arg l i e V a l top V a l Leu Pro V i l G l u H i s H e Leu T y r Thr Lys HET HET H e Gly Arg Phe Cys Ala G i n HET H i s Ser ton Asn Gly Pro C l n H e Gly Ser A l l V a l Gly Cys
6255
6270
6285
6300
6315
6330
6315
6360
AAC CCU GAC GUU (jAD UGC CAA AGA UUU 5CC ACA CAU UUC G C C C A l UAC AGA AAC G X X G GAU G X GAC UAC OCC GCC UUU GAU CCA AAC CAC X C ACU GAC CCU A X AAC AUC A X WJU
Asn Pro top V a l top T r p G i n Arg Phe C l y Thr H i s Phe Ala G i n T y r Arg Asn V a l T r p top V a l top T y r Ser Ala Phe top A l l Asn H i s Cys Ser top Ala HET Asn H e HET Phe
6375
6390
64Q5
6420
6435
6450
6165
6480
CAG GAG GUG UUC CGC ACA GAC UUU GCC UUC CAC CCA AAU GCU GlG X G AUC C X AAA ACC CUC GUG AAC ACG GAA CAC GCC UAU GAG AAC AAG CCC AUC ACU GUU GAA CCC CGC A X CCA
G l u C l u V a l Phe Arg Thr Asp Phe G l y Phe H i s Pro Asn A l a G l u T r p H e Leu Lys Thr Leu V a l Asn Thr G l u H i s A l l Tyr G l u Asn Lys Arg H e Thr V a l G l u G l y C l y HET Pro
6495
6510
6525
6510
6S55
6570
6585
66O0
UCU GCU UGU UCC GCA ACA AGC AUC AUU AAC ACA AUU U X AAC AAC AUC UAC GUG CUC UAC GCU U X CGU AGA CAC UAU GAG GGA GUU GAG CUG CAC ACU UAC ACt A X AUC UCC UAC GGA
Ser C l y Cys Ser A l l Thr Ser H e H e Asn Thr H e Leu Asn Asn H e T y r V i l Leu T y r A l l Leu Arg Arg H i s T y r G l u Gly V a l G l u Leu top Thr T y r Thr HET H e Ser T y r Gly
6615
6630
6645
6660
6675
6690
6705
6720
GAC GAC AUC G X GUG GCA ACU GAU UAC CAU C X CAC UUU GJC CCU CUC AAC CCC CAC UUC AAA UCU CUU GCC CAA ACC AUC ACU CCA GCU GAC AAA ACC GAC AAA GCU UUU CUU CUU CCU
top top H e V i l V a l A l a Ser top T y r top Leu top Pht C l u Ala Leu Lys Pro H i s Phe Lys Ser Leu Gly Gin Thr H e Thr Pro Ala Asp Lys Ser Asp Lys Gly Phe V i l Leu C l y
6735
6750
6765
6780
6795
6810
6825
6840
CAC UCC AUU ACC GAU GUC ACU UUC CUC AAA AGA CAC UUC CAC A X GAU UAU GCA ACU CGC UUU UAC AAA CCU G X A X GCC UCA AAG ACC CUU G C GCU AUC CUC UCC UUU CCA CGC CGU
C l n Ser l i e T h r top V i l Thr Phe Leu Lys Arg H i s Phe H i s HET top T y r G l y Thr C l y Phe T y r Lys Pro V a l HET A l l Ser Lys Thr Leu G l u A l a H e Leu Ser Phe Ala Arg Arg
6955
6870
6885
6900
6915
6930
6945
6960
GGC ACC AUA CAC GAG AAG U X AUC UCC C X GCA GGA CUC GCC GUC CAC UCU GCA CCA GAC GAG UAC COG OGU CUC UUU GAC CCC UUC CAC GGC CUC UUU CAG AUC CCA AGC UAC AGA OCA
Cly Thr l i e G i n G l u Lys Leu H e Ser V a l A l l C l y Leu Ale V a l H i s Ser Cly Pro top G l u T y r Arg Arg Leu Phe G l u Pro Phe Gin Gly Leu Phe G l u H e Pro Ser T y r Arg Ser
6975
6990
7005
7020
7035
7050
7065
7080
CUU UAC C X OGU
CGU OCC
OGC GUG AAC GCC G X OGC
GU GAC GCA UAA UCC CCC AGA GAU GAC AAU X C CAC UUC GCC UCU GGG GCC OGC GAC GCC GUA GGA G X AAA K C UCC AAA GAG CUU DOC
OGC CGU
Leu T y r Leu Arg T r p V a l Asn Ala V a l Cys Gly top Ala , , ,
C
CCGCUUCCUCAAUOCAAAAAAAAAAAA
FIGURE 2 The complete nucleotide sequence of the coding region of FMDV type A,.. 61 and the corresponding amino acid
sequence from the major open reading frame. Putative cleavage sites for some of the viral proteins are indicated and
those for which there is supportive amino acid sequence data are indicated with a superfix referring to the following
references: 1) 11, and references therein; 2) 2, 2L 3) 14.
</>
~"
Nucleic Acids Research
Nucleotide Sequence
The nucleotide sequence determined from the recombinant clones and by
primer extension sequencing is shown in Figure 2. The sequence includes an open
translational reading frame of 2,333 codons terminating at a UAA codon 93 bases
from the 3' poly (A) sequence. Two lines of evidence suggest that this represents
the sequence coding for the FMDV polyprotein. Firstly, no other reading frame of
significant length is found in the sequence. Secondly, protein sequence data is
available for several viral proteins (11 and refs. therein, 14) and these sequences
correlate with the major open reading frame. Further sequence towards the 51 end
from the major open reading frame has termination codone in all three reading
frames (15).
Examination of our sequence gels allows ua to estimate the poly (C) tract to
be at least 500 nucleotides from the initiation site for translation. This would
mean that FMDV RNA has a 5' untranslated region in excess of 1,000 nucleotides
long and a total genome length of at least 8,100 nucleotides.
The nucleotide composition and dlnucleotide frequency of the coding region
are consistent with the data previously obtained for the region of the genome
coding for the structural proteins (11). In particular the base composition shows a
bias In favour of C+G over A+U. This is also reflected in chemical analysis of the
RNA (16). The relatively high frequency of CpG is also maintained throughout the
genome being 83% of the expected frequency compared to 37% for eukaryotes in
general (17) and 47% for poliovirus (18).
Predicted Amino Acid Sequence
The processing of picornaviral proteins is a complex process and appears
highly variable between different members of the family. However, the general
genomic organisation and the primary cleavage products derived from the
polyprotein are similar, (1, 19) with one important exception. The region to the 5'
side of the genome coding for the structural genes in FMDV and encephalomyocarditis virus code for additional proteins which have no identifiable
counterparts in poliovirus. It has been shown that two proteins are coded for in
this part of the genome in FMDV; namely p20a (Lab) and pl6 (Lb) (for polypeptide
nomenclature see Figure 1). Analysis of these proteins has shown them to have
similar tryptic peptide maps and limited proteolysls patterns (6, 15). Both of these
proteins can be labelled in vitro using N-formyl ( -S) methionine t RNAf" 181 (6)
and studies involving a protease inhibitor suggest that the C-terrnini of these
proteins are the same (20). These data suggest that there are two initiation sites in
FMDV RNA. Examination of the sequence data presented here shows the presence
of a second AUG codon located at nucleotide position 85 and in the same reading
2468
Nucleic Acids Research
frame as the initial AUG at position 1 (see Figure 2). Initiation at these two sites
would result in two similar proteins with a molecular weight difference of 3,800,
i.e. the difference in molecular weight between p20a (Lab) and pl6 (Lb). Beck et al
(21) have also recently reported the presence of these two putative initiation sites
in two other serotypes of the virua.
Although no protein sequence data are available to confirm the predicted
sequence of the three VPg proteins, their charge, amino acid composition and
tryptic peptide composition (2) are entirely consistent with the sequences
presented here. These data also agree with those of Forss and Schaller (22) on the
predicted sequence of the VPgs from FMDV serotypes O and C. Although in
general FMDV and polio show a high degree of similarity in genome organisation
the VPg genes are an example where they differ significantly. Polio has a single
VPg gene whereas FMDV has three, which, moreover are highly conserved in three
serotypes (22). All three VPgs from FMDV are equally represented in RNA
extracted from virus particles (2).
The poliovirus VPg has been found in a precursor molecule of molecular
weight 12,000 of which the VPg protein itself is contained in the C-terminal
portion. The non-VPg sequences In this precursor contain a hydrophobic region of
22 amino acids situated 7 amino acids to the N-terminal side of VPg. This
hydrophobic area is thought to be responsible for the association of the VPg
precursor with cellular membranes, since the VPg precursor is found in membrane
bound complexes (23). Comparison of hydrophilicity plots representing the VPg
region from FMDV and polio show no hydrophobic region in FMDV comparable to
that in polio. The functional precursor, (if any), containing the FMDV VPgs is at
present unknown.
Comparison of our sequence with the recently published sequence of the
FMDV A._ polymera8e (p56a; 3d) (14) show them to be very similar. There are 16
amino acid changes (out of 470 total) which are distributed throughout the protein.
Cleavage sites
The proteolytic cleavage sites involved in the processing of the poliovirus
polyprotein into the final products show a high degree of uniformity not seen with
FMDV. In polio eight of the identified cleavage sites are between gln-gly pairs; of
the remaining cleavages one is between tySgly and one is between asn-ser (24).
The latter cleavage is between VP4 and VP2 and occurs during the final stages of
virus maturation. With FMDV, on the other hand, little homology is apparent in the
sequences around the known cleavage sites. The possible cleavage sites involved in
processing the polyprotein of FMDV are indicated in Figure 2. Those sites for
which some amino acid sequence data are available are also indicated. Both host
2469
Nucleic Acids Research
and viral specified proteases are involved in the processing of FMDV proteins (20).
The cleavages thought to be caused by a viral enzyme (secondary cleavages)
indicate a preference for glu-gly bonds, for example between VP2 and VP3, at the
N termini of the VPgs and between p2Ob (3c) and p56a (3d, polymerase). Of the
twelve putative cleavage sites (both primary and secondary) five occur between
glutamate and glycine.
The putative host specified cleavage site (primary cleavage) between
p20a/pl6 and p88 (L and PI) has been revised from previous reports (7). This
cleavage is now proposed to occur after amino acid 204 between gly and gin on the
basis of homology between FMDV and EMC VP4 proteins (A. Palmenberg, personal
communication), and our recent observation that FMDV VP4 contains proline (15).
With the revised cleavage site VP4 contains a single proline at amino acid position
208 (see Figure 2\ The N-terminus of VP4 is known to be refractory to Edman
degradation in all picornaviruses which have been examined (11, 19, 24, 25) and
with FMDV and EMC this blockage may be due to deamination and cyclisation of a
glutamine.
CONCLUDING REMARKS
At present we do not have clones which span the poly (C) tract and so it has
not been possible to analyse the sequence of the 5' non coding region of the FMDV
genome. The reason for the lack of 51 end clones is not clear. It could be simply
due to the low frequency of cDNA transcripts which extend to the extreme 51 end,
or alternatively the unusual structure of the poly (C) tract may be unstable in E.
coli and eliminated during plasmid replication. Direct RNA sequencing of the,
extreme 5' end of the genome has been reported (26) and shows that considerable
homology exists between the first 27 nucleotides from all seven serotypes of
FMDV. This suggests that this region has a critical role in virus replication.
Sequence data from the 3' untranslated region of FMDV have also been
reported (14, 27). Porter et al (27) analysed the nucleotide sequence adjacent to
the poly (A) for five serotypes and found =75% homology in the first 20 nucleotides
and a highly conserved sequence of eleven nucleotides at position -7 to -17 from
the poly (A) tract. One of these serotypes was identical to the virus used in this
study and the sequence of 42 nucleotides reported agrees exactly with that
reported here. Recently the 3' sequence of a second type A virus (A._) has been
reported (14) and maintains the sequence of the eleven highly conserved
nucleotides, however the remainder of the sequence up to the poly (A) is
considerably different and includes three additional nucleotides. In the A._ virus
there is a single UAA termination codon located 96 nucleotides from the poly (A)
2470
Nucleic Acids Research
tract in contrast to the single UAA termination codon located 93 nucleotides from
the poly (A) reported here from the A._ virus.
A knowledge of the coding sequence of a virus is a necessary pre-requisite
for the full understanding of the functions of the encoded proteins in virus
structure and replication. Moreover, comparisons of the complete sequences of
.different viruses from within the picornavirus family will indicate evolutionary
relationships between the virus groups. Finally the sequence data is essential for
experiments involving site specific mutagenesis of an infectious clone of virus RNA
(28) in order to modify specific regions of the genome.
ACKNOWLEDGEMENTS
We would like to thank Dr A Makoff for computer analysis and Dr F Brown
for helpful discussion and critical reading of the manuscript. We would like to
thank Dr M Eaton (Cell Tech Ltd, UK) for kindly providing the synthetic
oligonucleotide primer.
•Present address: WeUcome Biotechnology Limited, Ash Road, Pirbright, Woking, Surrey, UK
REFERENCES
L
Sangar DV. (1979) J. Gen. Virol. 45, 1-13.
2.
King AMQ, Sangar DV, Harris TJR and Brown F. (1980) J. Virol. 34, 627-634.
3.
Sangar DV, Rowlands DJ, Harris TJR and Brown F. (1977) Nature 268, 64865a
4.
Flanegan JB, Petersson RF, Ambros V, Hewlett MJ and Baltimore D. (1977)
PNAS USA 74, 961-965.
5.
Nomoto A, Detjen B, Pozzatti R and Wimmer E. (1977) Nature 268, 208-213.
6.
Sangar DV, Black DN, Rowlands DJ, Harris TJR and Brown F. (1980) J.
Virol. 33, 59-68.
7.
Boothroyd JC, Highfleld PE, Cross GAM, Rowlands DJ, Lowe PA, Brown F
and Harris TJR. (1981) Nature 290, 800-802.
8.
Wensink PC, Finnegan DJ, Donelson JE and Hogness DS. (1974) Cell 3, 315325.
9.
Woods DE, Crampton JM, Clarke BE and Williamson R. (1980) Nucleic Acids
Research 8, 5157-5168.
10.
Maxam A and Gilbert W. (1980) in Methods in Enzymology VoL 65, pp499560, Grossman L and Moldave K (Eds), Academic Press, N.Y.
11.
Boothroyd JC, Harris TJR, Rowlands DJ and Lowe PA. (1982) Gene 17, 153161.
12.
Rowlands DJ, Clarke BE, Carroll AR, Brown F, Nicholson BH, Bittle JL,
Houghten RA and Lerner RA. (1983) Nature 306, 694-697.
13.
Makoff AJ, Paynter CA, Rowlands DJ and Boothroyd JC. (1982) Nucleic
Acids Research 10, 8285-8295.
14.
Robertson BH, Morgan DO, Moore DM, Grubman MJ, Card J, Fischer T,
Weddell G, Dowbenko D and Yansura D. (1983) Virology 126, 614-623.
15.
Clarke BE (1984) Manuscript in preparation.
16.
Newman JFE, Rowlands DJ and Brown F. (1973) J. Gen. ViroL 18, 171-180.
17.
Nussinov R. (1981) J. Mol. BioL 149, 125-13L
18.
RacanieUo VR and Baltimore D. (1981) PNAS USA 78, 4887-489L
19.
Rueckert RR. (1976) in Comprehensive Virology, Fraenkel-Conrat H and
2471
Nucleic Acids Research
20.
21.
22.
23.
24.
25.
26.
27.
28.
2472
Wagner RR (Eds) Vol. 6 ppl31-213, Plenum Press, New York.
Burroughs JN, Sangar DV, Clarke BE and Rowlands DJ. (1984) J. ViroL In
press.
Beck E, Forsa S, Strebel K, Cattaneo R and Fell G. (1983) Nucleic Acids
Research 11, 7873-7885.
Forss S and Schaller H. (1982) Nucleic Acids Research 10, 6441-6450.
Semler BL, Anderson CW, Kitamura N, Rotherberg PG, Wishart WL and
Wimmer E. (1981) PNAS USA 78, 3463-3468.
Kitamura N, Semler BL, Rothberg PG, Larsen GR, Adler CJ, Dorner AJ,
Emini EA, Hanecak R, Lee JJ, van der Werf S, Anderson CW and Wimmer E.
(1981) Nature 291, 547-553.
Scraba DG. (1979) in The Molecular Biology of Picornaviruses, PerezBercoff R. (Ed) ppl-23, Plenum Press, New York.
Harris TJR. (1980) J. Virol. 36, 659-664.
Porter AG, Fellner P, Black DN, Rowlands DJ, Harris TJR and Brown F.
(1978) Nature 276, 298-301.
Racaniello VR and Baltimore D. (1981) Science 214, 916-919.