Download Nearest Neighbour Base Sequence Analysis of the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Canine distemper wikipedia , lookup

Canine parvovirus wikipedia , lookup

Orthohantavirus wikipedia , lookup

Influenza A virus wikipedia , lookup

Hepatitis B wikipedia , lookup

Henipavirus wikipedia , lookup

Transcript
J. gen. ViroL (1967), 1, 101-108
101
Printed in Great Britain
Nearest Neighbour Base
Sequence Analysis of the Deoxyribonucleic Acids of a Further
Three Mammalian Viruses: Simian Virus 40, Human
Papilloma Virus and Adenovirus Type 2
By J. M. M O R R I S O N AND H. M. K E I R
Institute of Biochemistry
AND H. S U B A K - S H A R P E AND L. V. C R A W F O R D
Medical Research Council Unitfor Experimental Virus Research,
Institute of Virology, University of Glasgow, Glasgow, Scotland
(Accepted 17 September 1966)
SUMMARY
The nearest neighbour frequency analyses of the DNAs of Simian virus 40,
human papilloma virus and adenovirus type 2 are reported. The two small
oncogenic viruses have DNA closely resembling that of the host cells, which
confirms and extends the previous findings for such viruses. The DNA of
adenovirus type 2 shows only limited resemblance to that of the host cells.
The experimental findings are discussed in the context of previous analyses
of the DNAs of polyoma virus, Shope papilloma virus, herpes simplex
virus, pseudorabies virus, equine rhinopneumonitis virus and vaccinia virus.
INTRODUCTION
Josse, Kaiser & Koruberg (1961)* and Swartz, Trautner & Kornberg (1962)
initiated and developed the technique of frequency analysis of doublets (nearest
neighbour base sequences) in DNA. Basically, the technique involves isolation of
polydeoxyribonucleotide synthesized enzymically on a supplied DNA template, from
four reaction mixtures each containing the four deoxyribonucleoside 5'-triphosphates.
Only one of these triphosphates (a different one for each reaction) is labelled with
a~p in the ~-phosphate position. The *~P-labelled phosphate therefore enters the
reaction attached at the 5'-position of the deoxyribose moiety in the labelled nucleotide. The synthesized polydeoxyribonucleotide is then degraded to deoxyribonucleoside 3'-monophosphates by the consecutive action of micrococcal nuclease (EC 3.1.4.7)
and spleen phosphodiesterase (EC 3.1.4.1) and the four nucleotides from each reaction are separated by electrophoresis on paper and their 32p contents measured. The
a~p label is thus transferred from the 5'-position of the ingoing nucleotide to the
3'-position of the nearest neighbour deoxyribonucleoside in the newly synthesized
* In keeping with the original paper of Josse et aL (1961), dinucleotide sequences derived from
nearest neighbour frequency analysis are denoted by ApC [deoxyadenylyl-(3'-5')-deoxycytidine],
GpT [deoxyguanylyl-(3'-5')-deoxythymidine], etc.; the molar proportions of adenine, thymine,
guanine and cytosine are denoted by ' a ' , ' t ' , ' g ' and ' c ' , and are expressed as a fraction of 1.000.
~o (G + C) is the percentage of guanine plus cytosine in the DNA.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 09 Aug 2017 23:30:02
102
J.M. MORRISON
AND
OTHERS
polydeoxyribonucleotide. For example, if [~-32P]deoxyadenosine 5'-triphosphate were
employed, the a2p in the four 3'-monophosphates isolated would describe the frequency
distribution of the four doublets ApA, CpA, GpA and TpA. In this way, for any
given DNA template, the frequency of occurrence of the nearest neighbour nucleotide
to each base in turn is determined. The complete analysis is given as the frequencies
of occurrence of all sixteen doublets.
Josse et al. (1961) and Swartz et al. (1962) investigated DNAs from many different
sources, and showed that the DNAs of different organisms have highly characteristic
and significantly non-random doublet frequencies. Their data showed a striking
anomaly in the occurrence of the CpG doublet, there being close correspondence to
random expectation of its frequency in the DNAs of bacteria and bacteriophages,
a substantial deficiency (two-thirds of random expectation is present) in echinoderms,
and an extreme deficiency (less than one-third of expectation) in vertebrates.
Assuming that the major part of an organism's DNA is concerned with the specification of polypeptides, then the extreme infrequency of the CpG doublet in the
vertebrates must reflect rarity of use of this doublet in programming protein synthesis.
Translation of nucleic acids into proteins is mediated by codon-specific species of
transfer-RNA molecules, and it seems reasonable to assume that the population of
transfer-RNA species in the cells of an organism will be optimally adapted (as a
consequence of natural selection) to the translation requirements of the DNA of that
organism. These considerations imply that there should be a severe shortage in
mammalian cells of those transfer-RNA species that recognize CpG-containing
codons.
The DNAs of viruses which use the pre-existing translation apparatus of the host
cells would have to be adapted to be translated by the transfer-RNA population of
the host cells. The doublet pattern of such virus DNA should therefore resemble that
of the host DNA. Only viruses that modify the codon recognition of the cell's transferRNA population would escape this restriction, and the doublet pattern of the nucleic
acid of such viruses would be independent of that of the host cells.
To test this hypothesis, we have analysed the DNAs of six mammalian viruses and
their host cells (Subak-Sharpe et aL 1966a). It was found that the doublet pattern of
the DNA of the two small oncogenic viruses tested (polyoma and Shope papilloma)
did indeed closely resemble that of mammalian cell DNA, whereas the DNAs of
four large viruses (herpes simplex, pseudorabies, equine rhinopneumonitis and
vaccinia) gave different patterns. These latter patterns conformed more closely to
random expectations although differing from each other.
The present communication deals with the analysis of a further three DNA viruses
which were studied to ascertain whether the conclusions tentatively drawn from the
earlier investigation were more generally applicable. Three of the DNAs previously
tested were included to serve as controls (vaccinia and equine rhinopneumonitis DNAs;
and BHK21/C 13 cell DNA).
METHODS
Preparation o f D N A . DNAs from equine rhinopneumonitis virus, vaccinia virus
and BHK21/C 13 (hamster) cells were prepared as previously described (Subak-Sharpe
et al. 1966a). The DNAs of human papilloma virus and Simian virus 40 (sv40) were
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 09 Aug 2017 23:30:02
Analysis o f DNA o f mammalian viruses
103
prepared according to Crawford & Crawford (1963) and Crawford & Black (1964).
Adenovirus type 2 was grown in monolayer cultures of H e L a cells and extracted from
the infected cells by three cycles of freezing and thawing followed by a 3 hr treatment
at 37 ° with 0.25 ~ sodium deoxycholate. The resulting virus suspension was purified
by density gradient centrifugation and the virus D N A extracted by the method of
Green & Pifia (1964)and freed from protein by sedimentation through CsC1 (density =
1.45 g./ml.).
All D N A s were checked for purity by centrifugation to equilibrium in CsC1 in the
Model E Spinco Ultracentrifuge (Table 1).
Nearest neighbour frequency analysis. The technical details and procedure adopted
were as presented by Subak-Sharpe et al. (1966a), which contained only slight modifications of the procedure described by Josse et aL (1961) and Swartz et aL (1962).
Table 1. Experimentally obtained values from nearest neighbour frequency analyses of
the DNAs of BHK21/C13 cells and five mammalian viruses
DNA
BHK21/C13
sv40
Human
papilloma
Adenovirus Equine rhinotype 2
pneumonitis
ApT
TpA
ApATpT
GpTApC
TpGCpA
GpATpC
ApGCpT
GpGCpC
GpC
CpG
(G + C) ~ from frequency
analysis
(G+C) % from buoyant
density determination
82
73
98 108
60 52
79 68
62 57
69 68
44 40
35
8
38-2
74
68
105 116
58 48
77 72
54 50
73 62
49 44
44
6
39-0
79
72
91 96
57 54
73 69
57 55
65 64
50 45
48
24
41.4
48
44
64 68
59 56
71 66
55 55
62 64
72 72
82
62
53.2
48
50
58 57
63 58
67 58
60 57
62 62
79 73
77
72
54.4
124
111
106 112
53 49
57 52
65 61
55 53
28 26
22
28
32.5
42
41
41
57
55
35
Vaccinia
Each DNA was analysed at least twice; the average values are given. The values of the doublet frequencies are expressed in parts per thousand.
RESULTS
The nearest neighbour frequencies of the six D N A s analysed are presented in
Table 1. in five of the six analyses, the percentage (G + C) of the synthesized D N A s
was slightly lower than would be expected from published values or from the buoyant
density determinations on the template DNA. This was also found in our previous
studies (Subak-Sharpe et al. 1966a) and is at present under investigation. It must
introduce a small error into the frequency patterns, but we consider that the doublet
patterns of the D N A s examined are sufficiently precise for our present purposes.
A series o f ' shortage histograms' (Fig. 1) has been drawn to render easier individual
comparisons of doublet frequencies in the viral D N A s and the host cell D N A . The
'shortage histogram' indicates the extent (expressed as a percentage) to which any
doublet frequency in the host D N A falls short of that in the viral D N A . It has been
devised to focus attention on those doublets in the viral D N A which, if included in
codons, might give rise to difficulties at the level of translation. Excess of a doublet
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 09 Aug 2017 23:30:02
104
J . M . MORRISON AND OTHERS
in the host D N A relative to the virus D N A is of little interest here as it is the virus
D N A which parasitizes the host cell. We have arbitrarily assigned four levels o f
shortage which are illustrated in Fig. 1, but do not regard shortages o f less than twothirds to constitute a serious problem for translation.
The D N A s of sv40 and h u m a n papilloma virus are unlikely to encounter difficulties
<
I-- < I--
t.9<t---U t.9 I - - < U
t.9 U
t.9 t.)
90
=
80
70
60
,a=
50
40i
~30 ~
20
10
0
sv40
Human
papilloma
-
90
-
50
-
90
-so
90
Adenovirus
type 2
-~
50
Equine
rhinopneumonitis
-
Vaccinia
~
~
90
50
-
-
~
Shortage histogram = 100
90
50
Doublets per 1000 present in host
× 100
Doublets per 1000 present in virus
Fig. 1. Doublet frequency patterns of mammalian virus DNAs expressed as shortage
histograms relative to the frequency pattern of human spleen cell DNA. The latter, which is
shown on the figure above the shortage histogram, is used as a reference (data of Swartz et aL
1962), since all mammalian DNAs investigated so far have essentially the same doublet
frequency pattern. [3, 1-49 ~ ; [], 50--66 %; [~. 67-89 %; II, > 90 ~.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 09 Aug 2017 23:30:02
Analysis o f D N A o f mammalian viruses
105
at the level of the cell's translation apparatus, whereas the D N A of adenovirus type 2
may well do so. The results for equine rhinopneumonitis virus and vaccinia virus
confirm our previous findings (Subak-Sharpe et al. 1966a).
In the case of every DNA, the frequency of each doublet deviates characteristically
from random expectation. To facilitate comparison of these DNAs of widely differing
( G + C ) content, the observed doublet frequency values have been normalized to the
values they would have if the D N A contained 50 ~ (G + C) (Subak-Sharpe et aL 1966 a).
The normalized values are listed in Table 2.
Table 2. Normalized nearest neighbour frequencies of the DiVAs of
BHK21/C13 cells and of five mammalian viruses
DNA
BHK21[C13
ApT
TpA
ApA TpT
GpT ApC
TpG CpA
GpA TpC
ApG CpT
GpG CpC
GpC
CpG
a/t
54
48
68 67
59 59
78 77
65 62
72 73
69 75
60
14
0.3011
0.3168
0.1994
0.1828
0.95
g/c
1.09
a
t
g
C
sv40
50
46
73 75
57 54
76 81
55 54
74 67
73 80
72
10
0.2989
0.3109
0.2049
0-1853
0.96
1-11
Human
AdenovirusEquine rhinopapilloma
type 2
pneumonitis Vaccinia
58
53
68 69
57 58
73 74
58 57
66 67
69 69
70
35
0-2895
0-2959
0.2124
0.2021
0.98
55
50
76 74
58 58
69 68
56 54
63 63
63 64
73
55
0.2293
0.2390
0.2673
0.2644
0.96
58
60
71 67
61 61
65 61
59 58
61 64
63 65
65
61
0.2258
0.2304
0-2792
0.2645
0.98
68
61
60 60
58 58
62 62
73 71
62 61
63 65
52
66
0.3336
0.3414
0.1672
0.1579
0.98
1-05
1.01
1.06
1.06
The values shown in Table 1 have been normalized to correspond with DNA containing 50 ~ (G + C).
Normalizing entails dividing the observed doublet frequency, as listed in Table 1, by the product of the
frequencies of the bases that make up the particular doublet, and multiplying by 0.0625 (the random fiequency expected for every doublet in DNA of 50~ (G+C) content (Subak-Sharpe et al. 1966). For
example, the DNA of BHK21JC13 cells, the normalized frequency of the ApT doublet is [82/(0.3011
x 0-3168)]x 0'0625.
The direction and extent of deviation from random expectation for all sixteen
doublets are depicted in Fig. 2 to allow ready comparison of the patterns of deviation
found in the different DNAs. The overall doublet patterns of the D N A of sv40, and
to a lesser extent of human papilloma virus, strikingly resemble that of the host DNA.
Thus, in this respect, these two small oncogenic viruses are similar to the two originally
investigated (polyoma virus and Shope papilloma virus). In contrast, the pattern of
deviation from random expectation shown by the D N A o f adenovirus type 2 differs
more markedly from that of the host DNA, approaching the more nearly random
patterns of the D N A s of the previously studied large viruses, namely vaccinia, equine
rhinopneumonitis, herpes simplex and pseudorabies.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 09 Aug 2017 23:30:02
106
J. M. M O R R I S O N
<
20
+10
BHK21/C13
(38 %)
0
--10
20
30
40
I-- <1--
AND
OTHERS
t9 < I - - U t.91--<£9
~9 M t.9
U
t
L
+
sv40 (39 %)
I
m
I
Human +
papilloma _
(41%)
Adenovirus +
type 2 (53 %) i
Vaccinia +
(32 %) --
~
Equine rhino- +
pneumonitis _
(54 %)
~
Fig. 2. Doublet frequency pattern of DNAs normalized to 50 % (G + C) content and
expressed in terms of deviation in parts per thousand from random expectation. The random
expectation for each doublet is 62-5 parts per thousand. The ( G + C ) contents given for
each DNA are those calculated from the nearest neighbour analyses. The deviation from
the expectation (of 62.5) was calculated by use of the normalized experimental values, which
are given in Table 2.
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 09 Aug 2017 23:30:02
Analysis of DNA of mammalian viruses
107
DISCUSSION
It is now clear that four viruses (polyoma, sv40, human papilloma and Shope
papilloma) have DNAs whose doublet patterns closely resemble that of the DNA of
their mammalian host cells. These viruses also have in common that they are small,
with information in their DNAs sufficient to specify only in the order of ten polypeptides, that they are oncogenic, and that they contain DNA that is supercoiled
and has a (G + C) content of 41 to 48 ~. Previously, we have suggested that small viruses
(a) may have to utilize the pre-existing translation apparatus of the host cells, and
(b) may have evolved from stretches of the DNA of ancestral host cells. The fact that
the four such viruses investigated to date are oncogenic may or may not be significant.
At this stage it is not justifiable to infer in addition to a correlation between doublet
pattern and smallness of the nucleic acid, also a correlation between doublet pattern
and oncogenic potential. To clarify the situation, the doublet patterns of the DNAs of
oncogenic and non-oncogenic members of the adenovirus group will have to be
determined. A beginning has been made with the DNA of the non-oncogenic adenovirus type 2 and it is noteworthy that its doublet pattern exhibits only limited resemblance to that of the host DNA, showing much greater resemblance to the more
nearly random patterns of the DNAs of the large viruses. It is therefore important to
investigate the highly oncogenic adenoviruses types 12, 18 and 31, and the weakly
oncogenic adenoviruses types 3, 7, 11, 14, 16 and 21, and compare their doublet
patterns to those of non-oncogenic adenoviruses (2, 5, etc.). This is now being done.
At this stage, a case could be made for an empirical relationship between the degree
of resemblance of the DNA of viruses to that of the host cells and the size of the viral
genome. (Molecular weights of the DNAs are approximately as follows: polyoma
virus and sv40, 3 x 106; Shope and human papilloma viruses, 5 x 106; adenovirus
type 2, 23 x 106; herpes simplex virus, 70 x 106; vaccinia virus, 160 x 106). However,
this empirical approach seems to us to be unprofitable.
We prefer, and have presented, the hypothesis that large animal viruses with a
doublet pattern which shows no resemblance to that of the host cell DNA might
(a) modify the translation apparatus of the host cells by introduction or modification
of transfer-RNA species, and (b) take their origin in an evolutionary sense from the
DNA of organisms not closely related to the host cells. In the case of herpes simplex
virus there already is evidence which strongly suggests that this virus specifies new
arginyl transfer-RNA (Subak-Sharpe, Shepherd & Hay, 1966).
With regard to the peculiar rarity of the CpG doublet in vertebrate DNA, it is
noteworthy that all the 5-methylcytosine which is found in mammalian DNA is
present in the sequence CpG, and apparently that all cytosine in this sequence is
methylated (Doskotil & ~ormov~i, 1965); however, very little, if any, cytosine in the
DNA of polyoma virus grown in mammalian cells is methylated (Winocour, Kaye &
Stollar, 1965). The implications of neither observation are understood. Rarity of
CpG, taken together with the invariable enzymic methylation of only this doublet,
indicate perhaps that its function along the genetic message in mammalian DNA is
unusual. Here, C followed by G might not be used within codons, but only with
C in one codon and G in the next (XXC: GXX: XXX). If transcription and translation
are topographically connected (Stent, 1965) then methylation in mammalian DNA
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 09 Aug 2017 23:30:02
108
J. M. M O R R I S O N
AND OTHERS
may affect the fidelity of translation from D N A via R N A into protein. This may be
unnecessary and therefore not apply to invading viral D N A . Alternatively, viral D N A
may be spatially remote from the cell's methylating enzymes, and coating of the
D N A and virus assembly may proceed too promptly for methylation to take place.
We are grateful to Professor J. N. Davidson, F.R.S., and Professor M. G. P. Stoker
for their interest and support. The investigation was aided by a grant from the British
Empire Cancer Campaign. We also acknowledge with thanks the skilled technical
assistance of Miss Helen Moss, Mrs M. Scott and Mr P. Ferry. The Escherichia coli
strain cells from which the D N A polymerase was prepared was a generous gift from
Dr R. Elsworth and colleagues, M.R.E. Porton, England.
REFERENCES
CRAVCFORD,L. V. & BLACK,P. H. (1964). The nucleic acid of Simian virus 40. Virology 24, 388.
CRAWFORD,L.V. & CRAWFORD,E. M. (1963). A comparative study of polyoma and papilloma
viruses. Virology 21, 258.
DosKo~m, J. & ~ORMOV~,,Z. (1965). The methylated bases in deoxyribonucleicacids. I. Sequences of
deoxy-5-methyl-cytidylicacid in bacterial DNA. Colin. Czech. chem. Commun. 30, 38.
GREEN, M. & Pr~A, M. (1964). Biochemical studies on adenovirus multiplication. VI. Properties
of highly purified tumorigenic human adenoviruses and their DNAs. Proc. natn. Acad. Sci. U.S.A.
51, 1251.
JOSSE, J., KAISER,A.D. & KORt,mERO, A. (1961). Enzymatic synthesis of deoxyribonucleic acid.
VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J. bioL Chem.
236, 864.
STENT,G. S. (1965). Genetic transcription. Proc. R. Soc. B, 164, 181.
SUBAK-SHARPE,H., Bf3RK, R.R., CRAWFORD,L.V., MORRISON,J. M., HAY, J. & KERR, H. M.
(1966a). An approach to evolutionary relationships of mammalian DNA viruses through analysis
of the pattern of nearest neighbor base sequences. Cold Spr. Hath. Symp. quant. Biol. 31 (in the
Press).
StrBAI<-SHARPE,H., SHEPHERD,W. M. & HAY, J. (1966). Studies on s-RNA coded by herpes virus.
Cold Spring Harb. Symp. quant. BioL 31 (in the Press).
SWARTZ,M. N., TRAUTNER,T. A. & KORNBERO,A. (1962). Enzymatic synthesis of deoxyribonucleic
acid. XI. Further studies on nearest neighbor base sequences in deoxyribonucleic acids. J. biol.
Chem. 237, 1961.
WrNocouR, E., KAYE, A.M. & STOLLAR,V. (1965). Synthesis and transmethylation of DNA in
polyoma-infected cultures. Virology 27, 156.
(Received 6 September 1966)
Downloaded from www.microbiologyresearch.org by
IP: 88.99.165.207
On: Wed, 09 Aug 2017 23:30:02