Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
On the evolution of the genetic codes, represented as attractors 2-adic functions Dr. Ekaterina Yurova Axelsson Linnaeus University, Sweden September 10, 2015 P-adic numbers found numerous applications, e.g., cognitive models and psychology, and genetics: 1. A. Khrennikov, Information dynamics in cognitive, psychological, social, and anomalous phenomena. Ser.: Fundamental Theories of Physics, Kluwer, Dordreht, 2004. 2. Khrennikov, A. Yu., 2006, P-adic information space and gene expression. In: Integrative approaches to brain complexity, editors Grant S., Heintz N., Noebels J., Wellcome Trust Publ., p.14. 3. B.Dragovich, A.Dragovich, A p-Adic Model of DNA Sequence and Genetic Code, p-Adic Numbers, Ultrametric Analysis and Applications, 1, N 1, 34-41 (2009). arXiv:q-bio/0607018v1 4. A. Khrennikov, Gene expression from polynomial dynamics in the 2-adic information space, Chaos, Solitons, and Fractals, 42, 341-347 (2009). 5. A. Khrennikov and S. Kozyrev, p-Adic numbers in bioinformatics: from genetic code to PAM-matrix; arXiv:0903.0137v3 (2009). 6. Dragovich, B.: p-Adic Structure of the Genetic Code. NeuroQuantology, Vol. 9, No. 4, 716727. (2011). arXiv:1202.2353v1. 7. A. Khrennikov, S. V. Kozyrev, Genetic code on the diadic plane, Physica A: Statistical Mechanics and its Applications, 381, 265-272 (2007). Outline I Short introduction I Proposed 2-adic model I Some observations Introduction I I I Deoxyribonucleic acid (DNA) is a molecule that carries most of the genetic instructions used in the development, functioning and reproduction of all known living organisms and many viruses. Within cells, DNA is organized into long structures called chromosomes. During cell division these chromosomes are duplicated in the process of DNA replication, providing each cell its own complete set of chromosomes. Eukaryotic organisms (animals, plants, fungi, and protists) store most of their DNA inside the cell nucleus and some of their DNA in organelles, such as mitochondria or chloroplasts. I In contrast, prokaryotes (bacteria and archaea) store their DNA only in the cytoplasm. Within the chromosomes, chromatin proteins such as histones compact and organize DNA. These compact structures guide the interactions between DNA and other proteins, helping control which parts of the DNA are transcribed. Introduction I Mitochondrial DNA (mtDNA) is the DNA located in organelles called mitochondria, structures within eukaryotic cells that convert chemical energy from food into a form that cells can use, adenosine triphosphate (ATP). I Mitochondrial DNA is only a small portion of the DNA in a eukaryotic cell; most of the DNA can be found in the cell nucleus, and in plants, the chloroplast as well. I Mitochondria are thought to have originated from incorporate α-purple bacteria. During its evolution into the present-day powerhouses of the eukaryotic cell, the endosymbiont transferred many of its essential genes to the nuclear chromosomes. Nevertheless, the mitochondrion still carries hallmarks of its bacterial ancestor. I Soon after mtDNA sequences became available, comparisons with mitochondrial protein sequences revealed deviations from the standard genetic code and later even variations in codon usage were found in mitochondria from dierent species. Introduction I I The genetic code is the map g : K → A, |K | = 64, |A| = 21, which gives the correspondence between codons in DNA and amino acids. 4 nucleotides: C (Cytosine), A (Adenine), G (Guanine), T (Thymine). In Ribonucleic acid (polymeric molecule implicated in various biological roles in coding, decoding, regulation, and expression of genes) Thymine is replaced by U (Uracil). I I Codon is an ordered triple of nucleotides. 20 amino acids and 1 stopcodon (Ter): alanine (Ala), threonine (Thr), glycine (Gly), proline (Pro), serine (Ser), aspartic acid (Asp), asparagine (Asn), glutamic acid (Glu), glutamine (Gln), lysine (Lys), histidine (His), arginine (Arg), tryptophan (Trp), tyrosine (Tyr), phenylalanine (Phe), leucine (Leu), methionine (Met), isoleucine (Ile), valine (Val), cysteine (Cys). Table for Standard Nuclear Genetic Code, 64 codons and 21 amino acids The origin of genetic code? The evolutionary history of organisms? Taxonomy? Preliminaries, P -adic approach I I I For every nonzero integer n let ordp (n) be the highest power of p which divides n, i.e. n ≡ 0 (mod p ordp (n) ), n 6≡ 0 (mod p ordp (n)+1 ) for any prime p ≥ 2. Then the p -adic norm is |n|p = p −ordp (n) , |0|p = 0. For rationals mn ∈ Q we set | mn |p = p −ordp (n)+ordp (m) . The completion of Q with respect to the p -adic metric ρp (x, y ) = |x − y |p is called the eld of p -adic numbers Qp . The norm satises the strong triangle inequality |x ± y |p ≤ max |x|p ; |y |p where equality holds if |x|p 6= |y |p . The set Zp = {x ∈ Qp : |x|p ≤ 1} is called the set of p -adic integers. I Every x ∈ Zp can be expanded in canonical form, i.e. in a convergent by p -adic norm series: x = x0 + px1 + . . . + p k xk + . . . , I xk ∈ {0, 1, . . . , p − 1}, k ≥ 0. Zp is equipped with the Haar measure µp normalized so that µp (Zp ) = 1. Proposed model I I We consider a 2-adic dynamical system hZ2 , µ2 , f i , f : Z2 → Z2 . An attractor of hZ2 , µ2 , f i is a subset A ⊆ Z2 such that: 1. A f , i.e. f (A) = A; U ⊂ Z2 , which shrinks to A under f , i.e. f (k) (U) → A for k → ∞; is invariant with respect to 2. There exists a set the function I I the action of The representation of the nucleotids C , A, T (U), G can be choosen in 24 variants. To obtain the function f in a compact way we set nucleotids as T (U) ↔ (1, 0), C ↔ (1, 1), A ↔ (0, 0), G ↔ (0, 1). Each codon is represented as a binary vector of the length 6, or as corresponding 2-adic number. For example, CAG ↔ (1, 1, 0, 0, 0, 1). This vector denes the 2-adic number 1 + 2 + 25 = 35. Proposed model Let us choose the function f in the way that each its attractor (as a set of 2-adic integers) coincide with the set of codons which coding the amino acid. For example, attractors of the function that denes Standard Nuclear Genetic Code are: Amino acid Ala Asn Cys Glu His Leu Met Pro Thr Tyr Stop Attractor {14, 46, 30, 62} {16, 48} {25, 57} {2, 34} {19, 51} {5, 37, 23, 55, 7, 39} {36} {15, 47, 31, 63} {12, 44, 28, 60} {17, 49} {1, 33, 9} Amino acid Arg Asp Gln Gly Ile Lys Phe Ser Trp Val Attractor {8, 40, 27, 59, 11, 43} {18, 50} {3, 35} {10, 42, 26, 58} {4, 20, 52} {0, 32} {21, 53} {13, 45, 24, 56, 29, 61} {41} {6, 38, 22, 54} Variation of genetic codes 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. The Standard Code The Vertebrate mtCode The Yeast mtCode The Mold, Protozoan, Coelenterate mtCode Mycoplasma, Spiroplasma Code The Invertebrate mtCode The Ciliate, Dasycladacean and Hexamita Nuclear Code The Echinoderm and Flatworm mtCode The Euplotid Nuclear Code The Bacterial, Archaeal and Plant Plastid Code The Alternative Yeast Nuclear Code The Ascidian mtCode The Alternative Flatworm mtCode Chlorophycean mtCode Trematode mtCode Scenedesmus obliquus mtCode Thraustochytrium mtCode Pterobranchia mtCode Candidate Division SR1 and Gracilibacteria Code Blepharisma Nuclear Code Example of representations I I I We represented 20 known genetic codes (National Center for Biotechnology Information) by the attractors of 2-adic function using van der Put and coordinate form. The function that denes Vertebrate mitochondrial P63 code has the following van der Put representation: Fm (x) = k=0 Mk χk (x). The function Fm can be represented in the explicit form depending on the values of binary digits in the canonical representation of the 2-adic numbers in the following way: Fm (x0 + 2x1 + 22 x2 + 23 x3 + 24 x4 + 25 x5 ) = Ω0 − Ω1 − Ω2 , where Ω0 =x0 + 2x1 + 4x2 + 8x3 + 16x4 + 32x̄5 , Ω1 =(x3 + x1 x2 x̄3 )(32x4 − 16)x5 Ω2 =x0 x̄1 x̄2 x3 (16 − 32x4 )x5 + x̄0 x̄1 x̄2 x3 (23 − 44x4 )x5 + x0 x̄1 x2 (23x3 − 18)x̄4 x5 + x0 (−7x̄1 x̄2 + 18x1 x2 )x3 x̄4 x5 . "Universal" function All considered variations of the genetic code can be obtained using "operations" on the cycles of some "Universal" function (6 variants). For example, the "Universal" function F can be dened by the followig cycles (attractors): {0, 32} {8, 40} {16, 48} {2, 34} {10, 42, 26, 58} {18, 50} {1, 33} {9, 41} {17, 49} {3, 35} {11, 43, 27, 59} {19, 51} {5, 37} {13, 45, 29, 61} {21, 53} {4, 36} {6, 38, 22, 54} {7, 39, 23, 55} {12, 44, 28, 60} {14, 46, 30, 62} {15, 47, 31, 63} {20, 52} {24, 56} {25, 57} "Universal" function I Analytically, considered function F has the following form F (x) = F (x0 + 2x1 + 22 x2 + 23 x3 + 24 x4 + 25 x5 ) = = x + 32(−1)x5 + 16x5 (−1)x4 I (x1 + x2 + x3 ≥ 2), (0.1) where I (x1 + x2 + x3 ≥ 2) = 1 as soon as x1 + x2 + x3 ≥ 2 is satised, otherwise I = 0. I In other words, I is a characteristic function of the event x1 + x2 + x3 ≥ 2. "Universal" function I I the "universal" function F consists of 8 cycles of the length 4 and 16 cycles of the length 2; "Universal" function 6= Genetic code! "Universal" function, "Operations" 1. Let a(b), where a is the length of the cycle, b is some element from the cycle, be this cycle of the "Universal" function F . 2. For example, {7, 39, 23, 55} we write as 4(7). 3. We need 3 types of "operations" on such cycles and 1 "iteration" (for Alternative Yeast nuclear code, Chlorophycean, Scenedesmus obliqnus, Thrastochytrium, Pretobranchia) in order to dene any of 20 genetic codes. "Operations" I "Addition": let a1 (b1 ) and a2 (b2 ) be the cycles of the function F . Let us consider new cycle a1 (b1 ) ⊕ a2 (b2 ) = a1 + a2 (b1 ). For example, 4(7) = {7, 39, 23, 55} and 4(12) = {12, 44, 28, 60}, then we get 8(7) = {7, 39, 23, 55, 12, 44, 28, 60}, which corresponds to amino acid Threonine (Thr) in the Yeast mt code. I I "Division": let 2(b1 ) = {b1 , b2 } and 2(c1 ) = {c1 , c2 }. Then 2(b1 ) ∨ 2(c1 ) = {b1 , c1 , c2 } ∪ {b2 }. "Cleavage": for some codes we need to split the cycle of the length 2 into 2 cycles of the length 1 each. For example, ∆2(9) = ∆{9, 41} = {9} ∪ {41}. Proposed model NUCLEAR CODE DNA PROCARYOTA EUKARYOTA Bacterial, Archaeal, PlantPlastid 2(5)+4(7) 2(8)+4(11) 2(24)+4(13) Standart nuclear code 2(5)+4(7) 2(8)+4(11) 2(24)+4(13) 2(4) ∨ 2(20) = {4, 20, 52} + {36} 2(1) ∨ 2(9) = {1, 33, 9} + {41} 2(4) ∨ 2(20) = {4, 20, 52} + {36} 2(1) ∨ 2(9) = {1, 33, 9} + {41} Ciliate, Desycladacean, Hexamita 2(5)+4(7) 2(8)+4(11) 2(24)+4(13) 2(4) ∨ 2(20) = {4, 20, 52} + {36} 2(1)+2(3) ∆2(9) = {9} + {41} Candidate Division, GraciliBacteria 2(5)+4(7) 2(8)+4(11) 2(24)+4(13) 2(4) ∨ 2(20) = {4, 20, 52} + {36} 2(9) ∨ 4(10) = {10, 42, 58, 26, 9} + {41} Euploid 2(5)+4(7) 2(8)+4(11) 2(24)+4(13) 2(4) ∨ 2(20) = {4, 20, 52} + {36} 2(9) ∨ 2(25) = {9, 25, 57} + {41} Blepharisma 2(5)+4(7) 2(8)+4(11) 2(24)+4(13) 2(4) ∨ 2(20) = {4, 20, 52} + {36} ∆2(9) = {9} + {41} 2(1) ∨ 2(3) = {3, 35, 33} + {1} 1{9} + 1{1} = {1, 9} Alternative Yeast nuclear code Mycoplasma, Spiloplasma 2(5) ∨ 4(7) = = {5, 37, 7, 55, 23} + {39} 2(5)+4(7) 2(8)+4(11) 2(24)+4(13) 2(8)+4(11) 2(24)+4(13) 2(1) ∨ 2(9) = {1, 33, 9} + {41} 1(39) + [2(24) + 4(13)] 2(4) ∨ 2(20) = {4, 20, 52} + {36} 1 Proposed model mt CODE DNA Chlorophycean 2(5)+4(7) 2(8)+4(11) 2(24)+4(13) Scenedesmus obliqnus 2(5)+4(7) 2(8)+4(11) 2(4) ∨ 2(20) = {4, 20, 52} + {36} ∆2(9) = {9} + {41} 2(1) ∨ [2(5) + 4(7)] = = {1} + {5, 37, 33, 7, 55, 39, 23} 2(24) ∨ 4(13) = = {24, 56, 61, 45, 29} + {13} 2(4) ∨ 2(20) = {4, 20, 52} + {36} ∆2(9) = {9} + {41} 2(1) ∨ [2(5) + 4(7)] = = {5, 37, 33, 7, 55, 39, 23} + {1} 1(1) + 1(9) 1(13) + 1(1) + 1(9) Thrastochytrium Mold, Protozean Coelenterate 2(5)+4(7) 2(8)+4(11) 2(24)+4(13) 2(5) ∨ 4(7) = {37, 7, 39, 55, 23} + {5} 2(8)+4(11) 2(24)+4(13) 2(4) ∨ 2(20) = {4, 20, 52} + {36} 2(1) ∨ 2(9) = {1, 33, 9} + {41} {1, 33, 9} + {5} = {1, 33, 9, 5} 2(4) ∨ 2(20) = {4, 20, 52} + {36} Echinoderm, Flatworm 2(5)+4(7) 2(8)+2(24)+4(13) Alternative Flatworm 2(5)+4(7) 2(8)+2(24)+4(13) 2(4) ∨ 2(20) = {4, 20, 52} + {36} 2(0) ∨ 2(16) = {0, 48, 16} + {32} 2(4) ∨ 2(20) = {4, 20, 52} + {36} 2(0) + 2(16) = {0, 48, 16} + {32} 2(1) ∨ 2(17) = {1, 49, 17} + {33} Trematode 2(5)+4(7) 2(8)+2(24)+4(13) Invertibrate 2(5)+4(7) 2(8)+2(24)+4(13) 2(0) ∨ 2(16) = {0, 48, 16} + {32} Yast 4(7)+4(12) 2(8)+4(11) 2(24)+4(13) Pretobranchia 2(5)+4(7) 2(24)+4(13) 2(4) ∨ 2(20) = {4, 20, 52} + {36} 2(0) ∨ 2(8) = {0, 32, 40} + {8} 1(8) + [2(24) + 4(13)] Ascidian 2(5)+4(7) 2(8)+4(10) 2(24)+4(13) Vertibrate 2(5)+4(7) 2(1)+2(8) 2(24)+4(13) 1 Proposed model, Observations I Presented approach can be seen as a contribution to the discussions about evolutionary systematics and evolutionary origins of the genetic code. I Classication (relationships) of the organisms based on the structure and the method of producing their genetic code from the "universal" function? I Dierence of the genetic codes between (groups of) species that are located at the same branch of the phylogenetic (evolutionary) tree? I Operation of "Cleavage" ∆ appears in the genetic codes of organisms that perform photosynthesis. I Flatworm mtCode vs. Alternative Flatworm mtCode - "shift": 2(5) + 4(7), 2(8) + 2(24) + 4(13), 2(4) ∨ 2(20) = {4, 20, 52} + {36}, 2(0) + 2(16) = {0, 48, 16} + {32} 2(1) ∨ 2(17) = {1, 49, 17} + {33}. Paper E. Yurova Axelsson, On the representation of the genetic code by the attractors of 2-adic function, Physica Scripta, IOP Publishing, September 2015