* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DETERMINATIVE DEGREE AND NUCLEOTIDE CONTENT OF DNA
DNA repair protein XRCC4 wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Agarose gel electrophoresis wikipedia , lookup
Real-time polymerase chain reaction wikipedia , lookup
Promoter (genetics) wikipedia , lookup
DNA sequencing wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Molecular ecology wikipedia , lookup
DNA profiling wikipedia , lookup
Restriction enzyme wikipedia , lookup
Genomic library wikipedia , lookup
Genetic engineering wikipedia , lookup
SNP genotyping wikipedia , lookup
Community fingerprinting wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Bisulfite sequencing wikipedia , lookup
Biochemistry wikipedia , lookup
Transformation (genetics) wikipedia , lookup
Molecular cloning wikipedia , lookup
DNA supercoil wikipedia , lookup
Non-coding DNA wikipedia , lookup
Point mutation wikipedia , lookup
Molecular evolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Biosynthesis wikipedia , lookup
BIOPHYSICAL BULLETIN №497 Visnyk Khark. Univ. 2000 УДК 577.3 DETERMINATIVE DEGREE AND NUCLEOTIDE CONTENT OF DNA STRANDS Diana Duplij and Steven Duplij Kharkov National University, Svoboda Sq. 4, Kharkov 61077, Ukraine E-mail: [email protected]. Internet: http://gluon.physik.uni-kl.de/˜duplij Received November 10, 2000 We introduce the determinative degree, a new characteristics of nucleotides which is connected with doublet sense and numerically describes their “empirical power” in determining of corresponding amino acids. For latter the analogous, but passive characteristics “predeterminativity” is also proposed, and it is shown that it correlates with the interaction energy of nitrous bases in corresponding DNA triplets. Purine-pyrimidine content of DNA sequences is considered in terms of the determinative degree, numeric explanation of CG pressure is proposed, and classification of DNA sequences is given. Calculations with real sequences show that purine-pyrimidine symmetry in one strand increases with growing of spice organization. KEY WORDS: genetic code, nucleotide, codon, amino acid, determinative degree, Chargaff rules, DNA sequence, strand. Thorough study of the genetic code in abstract way is useful for various DNA model constructions and indispensable for deep understanding of genes organization and expression [1]. In this direction the study of symmetries [2, 3], application of group theory [4] and implication of supersymmetry [5] are the most promising and necessary for further elaboration. In this paper we carefully investigate absolute nucleotide content connected with purine-pyrimidine symmetries of DNA sequences in terms of the determinative degree introduced in [6]. DOUBLETS AND DETERMINATIVE DEGREE OF NUCLEOTIDES Let us denote a triplet of nucleotides by xyz, where x, y, z = C, G, U, A. It is known that 16 possible doublets xy group themselves in 2 octets by ability of amino acid determination [7]. Eight doublets CC, CG, GC, CU, UC, GU, AC, GG have more “power”, since each of them simply encodes the same amino acid independently of third bases z. In case of other eight (“weak”) doublets UU, UG, CA, AG, GA, AU, UA, AA third bases determines an amino acid. In general, transition from the “powerful” ∗ ∗ octet to the “weak” octet can be obtained by the exchange [7, 8] C ⇐⇒ A, G ⇐⇒ U, which we name “star operation (∗)” and call purine-pyrimidine inversion. Moreover, all 4 doublets with y = C completely determine amino acid, but only 2 doublets with y = G and y = U completely determine it, while doublets with y = A never determine amino acid. Thus, if in addition we take into account GC pressure in evolution [9, 10] and third place preferences during codon-anticodon pairing [11], then 4 nucleotides can be arranged in descending order in the following way: Pyrimidine C dC = 4 very “strong” completely Purine G dG = 3 “strong” in 2 cases Pyrimidine T/U dT/U = 2 “weak” in 2 cases Purine A dA = 1 very “weak” never (1) Here we use the notation T/U, because genetic code is read from mRNA, and so we will not differ their determinative ability (“power”) in what follows. Now we introduce a numerical characteristics of the empirical “power” — determinative degree dx of nucleotide x and make transition from qualitative to quantitative description of genetic code structure [6]. It is seen from (1) that the determinative degree of nucleotide can take value dx = 1, 2, 3, 4 in correspondence of increasing “power”. If we denote determinative degree as upper index fornucleotide, then four bases (1) can be presented as a vector-row V = C(4) G(3) T(2) A(1) . Then the exterior product M = V × V represents the doublet matrix M and corresponding rhombic code [12], and the triple exterior product K = V × V × V corresponds to the cubic matrix model of the genetic code which were described in terms of the determinative degree in [6]. 1 Diana Duplij and Steven Duplij DETERMINATIVE DEGREE AND NUCLEOTIDE CONTENT To calculate the determinative degree of doublets xy we use the following additivity assumption dxy = dx + dy , (2) which holds valid for all triplets and for any nucleotide sequence. Then each of 64 elements (codons) of the cubic matrix K will have a novel numerical characteristics —determinative degree of codon dxyz = dcodon = dx + dy + dz which takes value in the range 3 ÷ 12 (see [6] for details and properties of cubic matrix K). DETERMINATIVE DEGREE AND AMINO ACIDS The connection between codon structure and properties of corresponding amino acids is a matter of importance [1]. Let us define an analog of the determinative degree for amino acid dAA as dAA = X dcodon ndeg , (3) where ndeg is its degeneracy (number of triplets encoding one amino acid [1]). So dAA can be treated as a new passive characteristics of amino acid which shows degree of “predeterminativity”. That can allow us to analyze new abstract amino acid “predeterminativity” properties in connection with known biological or physical properties. Thus, for each of 20 amino acids we have the Table 1. Table 1. “Predeterminativity” of amino acids Amino acid Lys Asn Ile Glu/Met/Tyr Phe/Asp/Gln Val/Thr Leu Cys/Trp/Ser/His Arg Gly Ala Pro dAA 4 5 5 13 6 7 7 12 7 23 8 8 13 8 12 9 12 11 ndeg 2 2 3 2/1/2 2/2/2 4/4 6 2/1/6/2 6 4 4 4 Note that in [13] all amino acids were divided into two classes by their interaction energy of nitrous bases in corresponding DNA triplets. First group Pro, Ala, Gly, Arg, Cys, Trp, Ser, Thr corresponds to upper energetic level 150-170 kJ/mole per codon in average, and second group Lys, Asn, Ile, Glu, Met, Tyr, Phe, Asp, Gln, Val, Leu, His corresponds to low energetic level 88-92 kJ/mole per codon in average. Comparing with the Table 1 we observe that first group has dAA ≥ 8, while second group has dAA < 8 (His and Thr are exceptional cases). DNA STRANDS AND CHARGAFF RULES Let us consider a numerical description of an idealized DNA sequence as a double-helix of two codon strands connected by complementary conditions [1]. Each strand is described by four numbers (nC , nG , nT , nA ) and (mC , mG , mT , mA ), where nx is a number of nucleotide x in one strand. In terms of nx and mx the complementary conditions are nC = mG , mC = nG , nT = mA , mT = nA . (4) The Chargaff’s rules [1] for a double-helix DNA sequence sound as: 1) the total quantity of purines and pyrimidines are equal NA + NG = NC + NT ; 2) the total quantity of adenine and cystosine is equal to the total quantity of guanine and thymine NA + NC = NT + NG ; 3) the total quantity of 2 BIOPHYSICAL BULLETIN №497 Visnyk Khark. Univ. 2000 adenine is equal to the total quantity of thymine NA = NT and the total quantity of cystosine is equal to the total quantity of guanine NC = NG ; 4) the ratio of guanine and cystosine to adenine and thymine v = (NA + NT ) / (NC + NG ) is approximately constant for each species. Usually the Chargaff’s rules are defined through macroscopic molar parts which are proportional to absolute number of nucleotides in DNA [1]. If we consider a DNA double-helix sequence, then Nx = nx + mx . In terms of nx and mx the first three Chargaff’s rules lead to the equations which are obvious identities, if complimentary (4) holds valid. From fourth Chargaff’s rule it follows that the specificity coefficient vnm for two given strands is vnm = nA + mA + nT + mT . nC + mC + nG + mG (5) The complementary (4) leads to the equality of coefficients v of each strand vnm = vn = vm ≡ v, and v is connected with GC content pCG in the double-helix DNA as pCG = 1/ (1 + v). We consider another important coefficient: the ratio of purines and pyrimidines k. For two strands from the first Chargaff’s rule we obviously derive knm = 1. But for each strand we have kn = nG + nA mG + mA , km = nC + nT mC + mT (6) which satisfy the equation kn km = 1 following from complementary. BIOLOGICAL SENSE OF DETERMINATIVE DEGREE IN DNA Let us introduce the determinative degree of each strand exploiting the additivity assumption (2) as dn = 4 · nC + 3 · nG + 2 · nT + 1 · nA , dm = 4 · mC + 3 · mG + 2 · mT + 1 · mA . (7) (8) The values dn and dm can be viewed as characteristics of the empirical “power” for strands, i.e. “strand generalization” of (1). Then we define summing and difference “power” of a double-helix sequence by d+ = dn + dm , d− = dn − dm . (9) The first variable d+ can be treated as the total empirical “power” of DNA (or its fragment). Taking into account the complementary conditions (4) we obtain d+ through variables of one strand d+ = 7 · (nC + nG ) + 3 · (nT + nA ) . (10) We can also present d+ through macroscopically determined variables Nx as follows d+ = 7 · NC + 7 3 3 · NA = 7 · NG + 3 · NT , or through GC and AT contents as d+ = · NC+G + · NA+T . 2 2 To give sense to the difference d− we derive d− = nC + nT − nG − nA . (11) We see that the star operation acts as (d+ )∗ = 10 − d+ and (d− )∗ = −d− . From (10)-(11) it follows the main statement: The biological sense of the determinative degree d is contained in the following purinepyrimidine relations: 1) The sum of the determinative degrees of the matrix and complementary strands in DNA (or its fragment) equals to d+ = 7 3 · NC+G + · NA+T . 2 2 (12) 2) The difference of the determinative degrees between matrix and complementary strands in DNA (or its fragment) exactly equals to the difference between pyrimidines and purines in one strand 3 Diana Duplij and Steven Duplij DETERMINATIVE DEGREE AND NUCLEOTIDE CONTENT d− = npyrimidines − npurines , (13) where npyrimidines = nC + nT and npurines = nG + nA , or it is equal to the difference of purines or pyrimidines between strands d− = npyrimidines − mpyrimidines = mpurines − npurines . We can also find connection between d+ , d− and the coefficients k and v as follows 1 3 d+ = NC+G (7 + 3v) = NC+G 2 + , 2 2 · pCG d− = npyrimidines (1 − kn ) . (14) (15) (16) If we consider one species for which v = const (or pCG = const), then we observe that d+ ∼ NC+G , which can allow us to connect the determinative degree with ”second level” of genetic information 7 [9]. From another side, the ratio of coefficients in (12) can play a numerical role in CG pressure 3 explanations [9], and therefore the total empirical power d+ can be considered as some kind of the “evolutionary power”. UNIQUENESS OF DETERMINATIVE DEGREE Here we prove the uniqueness of choice of the determinative degree from the sequence (1) in case if we wish to preserve the fundamental purine-pyrimidine relation (13). Let we define another generalized (r) “strength” of a DNA strand dn by using instead of (7) the formula d(r) n = rC · nC + rG · nG + rT · nT + rA · nA with arbitrary rx . Then the difference (r) d− (17) for strands after exploiting the complimentary is (r) d− = (rC − rG ) · (nC − nG ) + (rT − rA ) · (nT − nA ) . (18) (r) The purine-pyrimidine relation (13) and d− = d− lead to the system of equations rC − rG = 1, rT − rA = 1. (19) (20) (r) So, if d− = d− , then we have the two-parameter description of a DNA strand d(r) n = rC · nC + (rC − 1) · nG + rT · nT + (rT − 1) · nA , (21) (r) for which d+ = (2rC − 1) · (nC + nG ) + (2rT − 1) · (nT + nA ). It is interesting that the choice rC = 2/3 and rT = 1/3 reminds quark structure of hadrons (charges of quarks are qup = 2/3, qdown = −1/3, qstrange = −1/3, qcharm = 2/3, see e.g. [14]), and in that case d(quark) = n 2 1 1 2 · nC − · nG + · nT − · nA , 3 3 3 3 (22) (quark) where contribution of A and T corresponds to antiquarks. In this case the summing “power” is d+ = (nC + nG − nT − nA ) /3, which reflects CG/AT content of one strand. If we postulate “equidistant” description where rG −rT = 1, then we obtain a one-parametric formula d(r) n = (r + 3) · nC + (r + 2) · nG + (r + 1) · nT + r · nA . (23) Here we stress that despite translational symmetry in r there remain nontrivial 2 possibilities, if we restrict ourselves by positive integer numbers r = 1 and r = 0. The last case leads to quaternary description of the genetic vocabulary d(4) n = 3 · nC + 2 · nG + nT , (24) where genetic texts are represented in the scale of notation with numbers (0, 1, 2, 3), which can describe DNA sequences in more informative way than the binary description by (0, 1) numbers [15]. We consider the most natural case r = 1 which coincides with the choice (1) and satisfies all positivity requirements. 4 BIOPHYSICAL BULLETIN №497 Visnyk Khark. Univ. 2000 CLASSIFICATION OF DNA IN TERMS OF DETERMINATIVE DEGREE Now we consider the determinative degree of double-helix sequences in various extreme cases and classify them. We call a DNA sequence mononucleotide, dinucleotide, trinucleotide or full, if one, two, three or four numbers nx respectively are distinctive from zero. Properties of mononucleotide doublehelix DNA sequence are in the Table 2. Table 2. Mononucleotide DNA nx nC 6= 0 nG 6= 0 nT 6= 0 nA 6= 0 d+ 7nC 7nG 3nT 3nA d− nC −nG nT −nA amino acid Pro Gly Phe Lis The mononucleotide sequences which encode most extended amino acids Gly and Lis have the negative difference d− , and the mononucleotide sequences which encode amino acids Pro and Phe with similar chemical type of radicals have the positive d− . The dinucleotide double-helix DNA sequences (without mononucleotide parts) are described in the Table 3. Table 3. Dinucleotide DNA nx nC 6= 0, nG 6= 0 nC 6= 0, nT 6= 0 nC 6= 0, nA 6= 0 nG 6= 0, nT 6= 0 nG 6= 0, nA 6= 0 nT 6= 0, nA 6= 0 d+ 7 (nC + nG ) 7nC + 3nT 7nC + 3nA 7nG + 3nT 7nG + 3nA 3 (nT + nA ) d− nC − nG nC + nT nC − nA nT − nG −nG − nA n T − nA amino acid Pro,Arg,Ala,Gly Pro,Phe,Leu,Ser Pro,Gly,Asn,Tur,His Gly,Leu,Val,Cys,Trp Gly,Glu,Arg,Lys Leu,Asn,Tur,TERM The trinucleotide DNA can be listed in the similar and more cumbersome way. The full DNA sequences consist of nucleotides of all four types and are described by (10)-(11). The introduction of the determinative degree allows us to single out a kind of double-helix DNA sequences which have an additional symmetry. We call a DNA sequence purine-pyrimidine symmetric, if d− = 0, (25) i.e. the difference of strand empiric “powers” vanishes. From (11) it follows nC + nT = nG + nA , (26) i.e. kn = km = 1, which for one strand can be rewritten as npyrimidines = npurines (27) or as equality of purines and pyrimidines in two strands npyrimidines = mpyrimidines , npurines = mpurines . (28) (29) The purine-pyrimidine symmetry (26) has two particular cases: nC = nG , − symmetric DNA, nT = nA , n = nA , − antisymmetric DNA. 2) C nT = nG , 1) 5 (30) (31) Diana Duplij and Steven Duplij DETERMINATIVE DEGREE AND NUCLEOTIDE CONTENT The first case corresponds to the Chargaff’s rule applied to a single strand (which approximately holds valid for long sequences and is called “second Chargaff’s parity rule” [10]). So it would be interesting to compare the transcription and expression properties of symmetric and antisymmetric double-helix sequences. Another treatment of the purine-pyrimidine symmetric DNA sequences can be done using the differences of one nucleotide type between strands Mx = nx − mx having the properties MC = −MG , MT = −MA (32) which follow from the complementary. In terms of the differences for d− we derive d− = MC − MA = MT − MG . (33) Then the purine-pyrimidine symmetry condition (25) leads to the equations M C = MA , MT = MG (34) from which we can give another definition: A DNA sequence is called purine-pyrimidine symmetric if the difference of cystosine MC in two strands equals to the difference of adenine MA (or the difference of tymine MT in two strands equals to the difference of guanine MG ). So it is worthwhile to search and investigate purine-pyrimidine symmetric DNAs and DNA sequences using real sequences data. NUMERICAL ANALYSIS OF DNA SEQUENCES We have made a preliminary analysis of real sequences of several species taken from GenBank (2000) in terms of the determinative degree. It were considered 10 complete sequences of E.coli (several genes and full genomic DNA 9-12 min.), 12 complete sequences of Drosophila melanogaster (crc genes), 10 complete sequences of Homo sapiens Chromosome 22 (various clones), 10 complete sequences of Homo sapiens Chromosome 3 (various clones). We calculated the nucleotide content NC , NT , NG , NA and the determinative degree characteristics d+ , d− , q = d− /d+ , kn and v for each sequence. Then we averaged their values for each species. The result is presented in the Table 4. Table 4. Mean determinative degree characteristics of real sequences Sequence E.coli Drosophila Homo sap. Chr.22 Homo sap. Chr.3 1P d+ n 90806 7325 337974 806435 1P d− n -138 -70 6865 -1794 1P q n -0.0068 -0.0089 0.00146 -0.00229 1P kn n 1.07 1.09 0.987 1.021 1P v n 1.38 1.31 1.14 1.55 First of all we observe that all real sequences have high purine-pyrimidine symmetry (smallness of parameter q). Also we see that the relation of purines and pyrimidines in one DNA strand kn is very close to unity, therefore we have a new small parameter in the DNA theory (kn − 1) (or q), which characterizes the purine-pyrimidine symmetry breaking. Therefore we can open possibility for various approximate and perturbative methods application. Second, we notice from Table 4 that the purine-pyrimidine symmetry increases in direction from protozoa to mammalia and is maximal for human chromosome. It would be worthwhile to provide a thorough study of purine-pyrimidine symmetry and codon usage in terms of the introduced determinative degree by statistical methods, which will be done elsewhere. Acknowledgments. Authors would like to thank G. A. Shepelev for providing with original computer programs of DNA sequences analysis. We are grateful to N. A. Chashin, S. V. Gatash, E. A. Gordienko, L. N. Lisetsky, V. Ya. Maleev, V. G. Shakhbazov, Yu. G. Shkorbatov, P. P. Shtefaniuk, V. Yu. Strashnyuk and O. A. Tretyakov for fruitful discussions and Jim Bashford, Gary Findley, Peter Jarvis, Sebastian Sachse, David Torney and Chun-Ting Zhang for useful correspondence and reprints. 6 BIOPHYSICAL BULLETIN №497 Visnyk Khark. Univ. 2000 REFERENCES 1. Singer M., Berg P. Genes and genomes. - Mill Valley: University Science Books, 1991. - 373 p. 2. Findley G. L., Findley A. M., McGlynn S. P. Symmetry characteristics of the genetic code // Proc. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. Natl. Acad. Sci. USA. - 1982. - V. 79. - № 22. - P. 7061–7065. Zhang C. T. A symmetrical theory of DNA sequences and its applications. // J. Theor. Biol. - 1997. V. 187. - № 3. - P. 297–306. Hornos J. E. M., Hornos Y. M. M. Model for the evolution of the genetic code // Phys. Rev. Lett. 1993. - V. 71. - P. 4401–4404. Bashford J. D., Tsohantjis I., Jarvis P. D. A supersymmetric model for the evolution of the genetic code // Proc. Natl. Acad. Sci. USA. - 1998. - V. 95. - P. 987–992. Duplij D., Duplij S. Symmetry analysis of genetic code and determinative degree // Biophysical Bull. Kharkov Univ. - 2000. - V. 488. - № 1(6). - P. 60–70. Rumer U. B. Sistematics of codons in the genetic codе // DAN SSSR. - 1968. - V. 183. - № 1. P. 225–226. Rumer U. B. On codon sistematics in the genetic code // DAN SSSR. - 1966. - V. 167. - № 6. P. 1393–1394. Forsdyke D. R. Different biological species “broadcast” their DNAs at different (C + G)% “Wavelengths” // J. Theor. Biol. - 1996. - V. 178. - P. 405–417. Forsdyke D. R. Relative roles of primary sequence and (C + G)% in determining the hierarchy of frequencies of complementary trinucleotide pairs in DNAs of different species // J. Mol. Biol. - 1995. - V. 41. - P. 573–581. Grantham R., Perrin P., Mouchiroud D. Patterns in codon usage of different kinds of species // Oxford Surv. Evol. Biol. - 1986. - V. 3. - P. 48–81. Karasev V. A. Rhombic version of genetic vocabulary based on complementary of encoding nucleotides // Vest. Leningr. un-ta. - 1976. - V. 1. - № 3. - P. 93–97. Nikolsky Y. K. Tricarboxylic acid cycle and genetic code // Biophysika. - 1983. - V. 28. - № 1. P. 133–134. Close F. E. An Introduction to Quarks and Partons. - London: Academic Press, 1979. - 438 p. Karasev V. A., Sorokin S. G. Topological structure of the genetic code // Genetika. - 1997. - V. 33. № 6. - P. 744–751. СТЕПЕНЬ ДЕТЕРМИНАЦИИ И НУКЛЕОТИДНЫЙ СОСТАВ ЦЕПЕЙ ДНК Д. Р. Дуплий, С. А. Дуплий Харьковский национальный университет им. В. Н. Каразина, пл. Свободы, 4, Харьков 61077, Украина В работе вводится новая характеристика нуклеотида d — степень детерминации, которая численно выражает степень однозначности дуплетов генетического кода, разделение их на “сильные” и “слабые” и соответствует “эмпирической силе”: dC = 4, dG = 3, dT/U = 2, dA = 1. По аналогии для аминокислот вводится “пассивная” характеристика — “предоP dcodon и показано, что она коррелирует с энергипределяемость” — по формуле dAA = ndeg ей взаимодействия азотистых оснований в триплетах ДНК. Обсуждается однозначность введенной характеристики. Рассматривается пурин-пиримидиновый состав последовательностей ДНК в терминах степени детерминации. Суммарная степень детерминации двух цепей ДНК 7 3 d+ = · NC+G + · NA+T позволяет нумерологически трактовать “давление CG пар” 2 2 в процессе эволюции, и разность d− = npyrimidines − npurines численно описывает пуринпиримидиновую асимметрию одной цепи. Приведена классификация ДНК в терминах d. Предварительные вычисления, проведенные на основе данных GenBank (2000), свидетельствуют о возрастании пурин-пиримидиновой симметрии с повышением организации видов. КЛЮЧЕВЫЕ СЛОВА: генетический код, нуклеотид, кодон, степень детерминации, правила Чаргаффа, последовательность ДНК, цепь. 7