Download On the evolution of the genetic codes, represented as attractors 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
On the evolution of the genetic codes, represented
as attractors 2-adic functions
Dr. Ekaterina Yurova Axelsson
Linnaeus University, Sweden
September 10, 2015
P-adic numbers found numerous applications, e.g., cognitive models and
psychology, and genetics:
1. A. Khrennikov, Information dynamics in cognitive, psychological,
social, and anomalous phenomena. Ser.: Fundamental Theories of
Physics, Kluwer, Dordreht, 2004.
2. Khrennikov, A. Yu., 2006, P-adic information space and gene
expression. In: Integrative approaches to brain complexity, editors
Grant S., Heintz N., Noebels J., Wellcome Trust Publ., p.14.
3. B.Dragovich, A.Dragovich, A p-Adic Model of DNA Sequence and
Genetic Code, p-Adic Numbers, Ultrametric Analysis and
Applications, 1, N 1, 34-41 (2009). arXiv:q-bio/0607018v1
4. A. Khrennikov, Gene expression from polynomial dynamics in the
2-adic information space, Chaos, Solitons, and Fractals, 42, 341-347
(2009).
5. A. Khrennikov and S. Kozyrev, p-Adic numbers in bioinformatics:
from genetic code to PAM-matrix; arXiv:0903.0137v3 (2009).
6. Dragovich, B.: p-Adic Structure of the Genetic Code.
NeuroQuantology, Vol. 9, No. 4, 716727. (2011).
arXiv:1202.2353v1.
7. A. Khrennikov, S. V. Kozyrev, Genetic code on the diadic plane,
Physica A: Statistical Mechanics and its Applications, 381, 265-272
(2007).
Outline
I
Short introduction
I
Proposed 2-adic model
I
Some observations
Introduction
I
I
I
Deoxyribonucleic acid (DNA) is a molecule that carries most of
the genetic instructions used in the development, functioning and
reproduction of all known living organisms and many viruses.
Within cells, DNA is organized into long structures called
chromosomes. During cell division these chromosomes are
duplicated in the process of DNA replication, providing each cell its
own complete set of chromosomes.
Eukaryotic organisms (animals, plants, fungi, and protists) store
most of their DNA inside the cell nucleus and some of their DNA in
organelles, such as mitochondria or chloroplasts.
I
In contrast, prokaryotes (bacteria and archaea) store their DNA
only in the cytoplasm. Within the chromosomes, chromatin proteins
such as histones compact and organize DNA. These compact
structures guide the interactions between DNA and other proteins,
helping control which parts of the DNA are transcribed.
Introduction
I
Mitochondrial DNA (mtDNA) is the DNA located in organelles
called mitochondria, structures within eukaryotic cells that convert
chemical energy from food into a form that cells can use, adenosine
triphosphate (ATP).
I
Mitochondrial DNA is only a small portion of the DNA in a
eukaryotic cell; most of the DNA can be found in the cell nucleus,
and in plants, the chloroplast as well.
I
Mitochondria are thought to have originated from incorporate
α-purple bacteria. During its evolution into the present-day
powerhouses of the eukaryotic cell, the endosymbiont transferred
many of its essential genes to the nuclear chromosomes.
Nevertheless, the mitochondrion still carries hallmarks of its
bacterial ancestor.
I
Soon after mtDNA sequences became available, comparisons with
mitochondrial protein sequences revealed deviations from the
standard genetic code and later even variations in codon usage were
found in mitochondria from dierent species.
Introduction
I
I
The genetic code is the map g : K → A, |K | = 64, |A| = 21, which
gives the correspondence between codons in DNA and amino
acids.
4 nucleotides: C (Cytosine), A (Adenine), G (Guanine), T
(Thymine). In Ribonucleic acid (polymeric molecule implicated in
various biological roles in coding, decoding, regulation, and
expression of genes) Thymine is replaced by U (Uracil).
I
I
Codon is an ordered triple of nucleotides.
20 amino acids and 1 stopcodon (Ter): alanine (Ala), threonine
(Thr), glycine (Gly), proline (Pro), serine (Ser), aspartic acid (Asp),
asparagine (Asn), glutamic acid (Glu), glutamine (Gln), lysine (Lys),
histidine (His), arginine (Arg), tryptophan (Trp), tyrosine (Tyr),
phenylalanine (Phe), leucine (Leu), methionine (Met), isoleucine
(Ile), valine (Val), cysteine (Cys).
Table for Standard Nuclear Genetic Code, 64 codons and 21
amino acids
The origin of genetic code? The evolutionary history of
organisms? Taxonomy?
Preliminaries, P -adic approach
I
I
I
For every nonzero integer n let ordp (n) be the highest power of p
which divides n, i.e. n ≡ 0 (mod p ordp (n) ), n 6≡ 0 (mod p ordp (n)+1 )
for any prime p ≥ 2. Then the p -adic norm is |n|p = p −ordp (n) ,
|0|p = 0. For rationals mn ∈ Q we set | mn |p = p −ordp (n)+ordp (m) .
The completion of Q with respect to the p -adic metric
ρp (x, y ) = |x − y |p is called the eld of p -adic numbers Qp . The
norm satises the strong triangle inequality |x ± y |p ≤ max |x|p ; |y |p
where equality holds if |x|p 6= |y |p .
The set Zp = {x ∈ Qp : |x|p ≤ 1} is called the set of p -adic
integers.
I
Every x ∈ Zp can be expanded in canonical form, i.e. in a
convergent by p -adic norm series:
x = x0 + px1 + . . . + p k xk + . . . ,
I
xk ∈ {0, 1, . . . , p − 1}, k ≥ 0.
Zp is equipped with the Haar measure µp normalized so that
µp (Zp ) = 1.
Proposed model
I
I
We consider a 2-adic dynamical system hZ2 , µ2 , f i , f : Z2 → Z2 .
An attractor of hZ2 , µ2 , f i is a subset A ⊆ Z2 such that:
1.
A
f , i.e. f (A) = A;
U ⊂ Z2 , which shrinks to A under
f , i.e. f (k) (U) → A for k → ∞;
is invariant with respect to
2. There exists a set
the function
I
I
the action of
The representation of the nucleotids C , A, T (U), G can be choosen
in 24 variants. To obtain the function f in a compact way we set
nucleotids as T (U) ↔ (1, 0), C ↔ (1, 1), A ↔ (0, 0), G ↔ (0, 1).
Each codon is represented as a binary vector of the length 6, or as
corresponding 2-adic number. For example, CAG ↔ (1, 1, 0, 0, 0, 1).
This vector denes the 2-adic number 1 + 2 + 25 = 35.
Proposed model
Let us choose the function f in the way that each its attractor (as a set
of 2-adic integers) coincide with the set of codons which coding the
amino acid.
For example, attractors of the function that denes Standard Nuclear
Genetic Code are:
Amino acid
Ala
Asn
Cys
Glu
His
Leu
Met
Pro
Thr
Tyr
Stop
Attractor
{14, 46, 30, 62}
{16, 48}
{25, 57}
{2, 34}
{19, 51}
{5, 37, 23, 55, 7, 39}
{36}
{15, 47, 31, 63}
{12, 44, 28, 60}
{17, 49}
{1, 33, 9}
Amino acid
Arg
Asp
Gln
Gly
Ile
Lys
Phe
Ser
Trp
Val
Attractor
{8, 40, 27, 59, 11, 43}
{18, 50}
{3, 35}
{10, 42, 26, 58}
{4, 20, 52}
{0, 32}
{21, 53}
{13, 45, 24, 56, 29, 61}
{41}
{6, 38, 22, 54}
Variation of genetic codes
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
The Standard Code
The Vertebrate mtCode
The Yeast mtCode
The Mold, Protozoan, Coelenterate mtCode
Mycoplasma, Spiroplasma Code
The Invertebrate mtCode
The Ciliate, Dasycladacean and Hexamita Nuclear Code
The Echinoderm and Flatworm mtCode
The Euplotid Nuclear Code
The Bacterial, Archaeal and Plant Plastid Code
The Alternative Yeast Nuclear Code
The Ascidian mtCode
The Alternative Flatworm mtCode
Chlorophycean mtCode
Trematode mtCode
Scenedesmus obliquus mtCode
Thraustochytrium mtCode
Pterobranchia mtCode
Candidate Division SR1 and Gracilibacteria Code
Blepharisma Nuclear Code
Example of representations
I
I
I
We represented 20 known genetic codes (National Center for
Biotechnology Information) by the attractors of 2-adic function
using van der Put and coordinate form.
The function that denes Vertebrate mitochondrial
P63 code has the
following van der Put representation: Fm (x) = k=0 Mk χk (x).
The function Fm can be represented in the explicit form depending
on the values of binary digits in the canonical representation of the
2-adic numbers in the following way:
Fm (x0 + 2x1 + 22 x2 + 23 x3 + 24 x4 + 25 x5 ) = Ω0 − Ω1 − Ω2 ,
where
Ω0 =x0 + 2x1 + 4x2 + 8x3 + 16x4 + 32x̄5 ,
Ω1 =(x3 + x1 x2 x̄3 )(32x4 − 16)x5
Ω2 =x0 x̄1 x̄2 x3 (16 − 32x4 )x5 +
x̄0 x̄1 x̄2 x3 (23 − 44x4 )x5 +
x0 x̄1 x2 (23x3 − 18)x̄4 x5 +
x0 (−7x̄1 x̄2 + 18x1 x2 )x3 x̄4 x5 .
"Universal" function
All considered variations of the genetic code can be obtained using
"operations" on the cycles of some "Universal" function (6 variants).
For example, the "Universal" function F can be dened by the followig
cycles (attractors):
{0, 32}
{8, 40}
{16, 48}
{2, 34}
{10, 42, 26, 58}
{18, 50}
{1, 33}
{9, 41}
{17, 49}
{3, 35}
{11, 43, 27, 59}
{19, 51}
{5, 37}
{13, 45, 29, 61}
{21, 53}
{4, 36}
{6, 38, 22, 54}
{7, 39, 23, 55}
{12, 44, 28, 60}
{14, 46, 30, 62}
{15, 47, 31, 63}
{20, 52}
{24, 56}
{25, 57}
"Universal" function
I
Analytically, considered function F has the following form
F (x) = F (x0 + 2x1 + 22 x2 + 23 x3 + 24 x4 + 25 x5 ) =
= x + 32(−1)x5 + 16x5 (−1)x4 I (x1 + x2 + x3 ≥ 2), (0.1)
where I (x1 + x2 + x3 ≥ 2) = 1 as soon as x1 + x2 + x3 ≥ 2 is
satised, otherwise I = 0.
I
In other words, I is a characteristic function of the event
x1 + x2 + x3 ≥ 2.
"Universal" function
I
I
the "universal" function F consists of 8 cycles of the length 4 and
16 cycles of the length 2;
"Universal" function 6= Genetic code!
"Universal" function, "Operations"
1. Let a(b), where a is the length of the cycle, b is some element from
the cycle, be this cycle of the "Universal" function F .
2. For example, {7, 39, 23, 55} we write as 4(7).
3. We need 3 types of "operations" on such cycles and 1
"iteration" (for Alternative Yeast nuclear code, Chlorophycean,
Scenedesmus obliqnus, Thrastochytrium, Pretobranchia) in order to
dene any of 20 genetic codes.
"Operations"
I
"Addition": let a1 (b1 ) and a2 (b2 ) be the cycles of the function F .
Let us consider new cycle a1 (b1 ) ⊕ a2 (b2 ) = a1 + a2 (b1 ). For
example,
4(7) = {7, 39, 23, 55}
and
4(12) = {12, 44, 28, 60},
then we get 8(7) = {7, 39, 23, 55, 12, 44, 28, 60}, which corresponds
to amino acid Threonine (Thr) in the Yeast mt code.
I
I
"Division": let 2(b1 ) = {b1 , b2 } and 2(c1 ) = {c1 , c2 }. Then
2(b1 ) ∨ 2(c1 ) = {b1 , c1 , c2 } ∪ {b2 }.
"Cleavage": for some codes we need to split the cycle of the length
2 into 2 cycles of the length 1 each. For example,
∆2(9) = ∆{9, 41} = {9} ∪ {41}.
Proposed model
NUCLEAR CODE DNA
PROCARYOTA
EUKARYOTA
Bacterial, Archaeal, PlantPlastid
2(5)+4(7)
2(8)+4(11)
2(24)+4(13)
Standart nuclear code
2(5)+4(7)
2(8)+4(11)
2(24)+4(13)
2(4) ∨ 2(20) = {4, 20, 52} + {36}
2(1) ∨ 2(9) = {1, 33, 9} + {41}
2(4) ∨ 2(20) = {4, 20, 52} + {36}
2(1) ∨ 2(9) = {1, 33, 9} + {41}
Ciliate, Desycladacean, Hexamita
2(5)+4(7)
2(8)+4(11)
2(24)+4(13)
2(4) ∨ 2(20) = {4, 20, 52} + {36}
2(1)+2(3)
∆2(9) = {9} + {41}
Candidate Division, GraciliBacteria
2(5)+4(7)
2(8)+4(11)
2(24)+4(13)
2(4) ∨ 2(20) = {4, 20, 52} + {36}
2(9) ∨ 4(10) = {10, 42, 58, 26, 9} + {41}
Euploid
2(5)+4(7)
2(8)+4(11)
2(24)+4(13)
2(4) ∨ 2(20) = {4, 20, 52} + {36}
2(9) ∨ 2(25) = {9, 25, 57} + {41}
Blepharisma
2(5)+4(7)
2(8)+4(11)
2(24)+4(13)
2(4) ∨ 2(20) = {4, 20, 52} + {36}
∆2(9) = {9} + {41}
2(1) ∨ 2(3) = {3, 35, 33} + {1}
1{9} + 1{1} = {1, 9}
Alternative Yeast nuclear code
Mycoplasma, Spiloplasma
2(5) ∨ 4(7) =
= {5, 37, 7, 55, 23} + {39}
2(5)+4(7)
2(8)+4(11)
2(24)+4(13)
2(8)+4(11)
2(24)+4(13)
2(1) ∨ 2(9) = {1, 33, 9} + {41}
1(39) + [2(24) + 4(13)]
2(4) ∨ 2(20) = {4, 20, 52} + {36}
1
Proposed model
mt CODE DNA
Chlorophycean
2(5)+4(7)
2(8)+4(11)
2(24)+4(13)
Scenedesmus obliqnus
2(5)+4(7)
2(8)+4(11)
2(4) ∨ 2(20) = {4, 20, 52} + {36}
∆2(9) = {9} + {41}
2(1) ∨ [2(5) + 4(7)] =
= {1} + {5, 37, 33, 7, 55, 39, 23}
2(24) ∨ 4(13) =
= {24, 56, 61, 45, 29} + {13}
2(4) ∨ 2(20) = {4, 20, 52} + {36}
∆2(9) = {9} + {41}
2(1) ∨ [2(5) + 4(7)] =
= {5, 37, 33, 7, 55, 39, 23} + {1}
1(1) + 1(9)
1(13) + 1(1) + 1(9)
Thrastochytrium
Mold, Protozean
Coelenterate
2(5)+4(7)
2(8)+4(11)
2(24)+4(13)
2(5) ∨ 4(7) = {37, 7, 39, 55, 23} + {5}
2(8)+4(11)
2(24)+4(13)
2(4) ∨ 2(20) = {4, 20, 52} + {36}
2(1) ∨ 2(9) = {1, 33, 9} + {41}
{1, 33, 9} + {5} = {1, 33, 9, 5}
2(4) ∨ 2(20) =
{4, 20, 52} + {36}
Echinoderm, Flatworm
2(5)+4(7)
2(8)+2(24)+4(13)
Alternative Flatworm
2(5)+4(7)
2(8)+2(24)+4(13)
2(4) ∨ 2(20) = {4, 20, 52} + {36}
2(0) ∨ 2(16) = {0, 48, 16} + {32}
2(4) ∨ 2(20) = {4, 20, 52} + {36}
2(0) + 2(16) = {0, 48, 16} + {32}
2(1) ∨ 2(17) = {1, 49, 17} + {33}
Trematode
2(5)+4(7)
2(8)+2(24)+4(13)
Invertibrate
2(5)+4(7)
2(8)+2(24)+4(13)
2(0) ∨ 2(16) = {0, 48, 16} + {32}
Yast
4(7)+4(12)
2(8)+4(11)
2(24)+4(13)
Pretobranchia
2(5)+4(7)
2(24)+4(13)
2(4) ∨ 2(20) = {4, 20, 52} + {36}
2(0) ∨ 2(8) = {0, 32, 40} + {8}
1(8) + [2(24) + 4(13)]
Ascidian
2(5)+4(7)
2(8)+4(10)
2(24)+4(13)
Vertibrate
2(5)+4(7)
2(1)+2(8)
2(24)+4(13)
1
Proposed model, Observations
I
Presented approach can be seen as a contribution to the discussions
about evolutionary systematics and evolutionary origins of the
genetic code.
I
Classication (relationships) of the organisms based on the structure
and the method of producing their genetic code from the "universal"
function?
I
Dierence of the genetic codes between (groups of) species that are
located at the same branch of the phylogenetic (evolutionary) tree?
I
Operation of "Cleavage" ∆ appears in the genetic codes of
organisms that perform photosynthesis.
I
Flatworm mtCode vs. Alternative Flatworm mtCode - "shift":
2(5) + 4(7), 2(8) + 2(24) + 4(13),
2(4) ∨ 2(20) = {4, 20, 52} + {36}, 2(0) + 2(16) = {0, 48, 16} + {32}
2(1) ∨ 2(17) = {1, 49, 17} + {33}.
Paper
E. Yurova Axelsson, On the representation of the genetic code by
the attractors of 2-adic function, Physica Scripta, IOP Publishing,
September 2015