Download article in press

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Heritability of IQ wikipedia , lookup

Medical genetics wikipedia , lookup

Gene wikipedia , lookup

Inbreeding wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genetic engineering wikipedia , lookup

Epistasis wikipedia , lookup

Genetic drift wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Gene expression programming wikipedia , lookup

Population genetics wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Microevolution wikipedia , lookup

Life history theory wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Transcript
ARTICLE IN PRESS
JID: MBS
[m5G;March 8, 2016;22:1]
Mathematical Biosciences xxx (2016) xxx–xxx
Contents lists available at ScienceDirect
Mathematical Biosciences
journal homepage: www.elsevier.com/locate/mbs
A formulation of the foundations of genetics and evolution
Brian Edward Bahr∗
Q1
16299 Dakota Shores Dr., Park Rapids, MN 56470, United States
a r t i c l e
i n f o
Article history:
Received 6 December 2014
Revised 13 February 2016
Accepted 17 February 2016
Available online xxx
Keywords:
Mathematical formulation
Mathematical simulation
Evolution
Genetics
Q2
a b s t r a c t
This paper proposes a formulation of theories of the foundations of genetics and evolution that can be
used to mathematically simulate phenotype expression, reproduction, mutation, and natural selection. It
will be shown that Mendelian inheritance can be mathematically simulated with expressions involving
matrices and that these expressions can also simulate phenomena that are modifications to Mendel’s
basic principles, like alleles that give rise to quantitative effects and traits that are the expression of
multiple alleles and/or multiple genetic loci.
© 2016 Elsevier Inc. All rights reserved.
1
1. Introduction
2
Similar to the way that Newton’s formulation of the laws of
motion can be used to mathematically simulate the trajectory of
objects under the influence of forces, this paper proposes a formulation of the foundations of genetics and evolution that can be
used to mathematically simulate phenotype expression, reproduction, mutation, and natural selection. This is not a new model of
these phenomena but a mathematical representation of an organism with matrices that are acted on by functions designed to have
the same effect on the representation as the biological processes
listed above have on true organisms. Accordingly, each organism is
represented by its own matrices and each matrix is operated on
separately, which means that simulating a population of any significant size demands a computer.
Simulating these biological processes on paper is not as simple
as using an equation that models them; however, a well-written
computer program can make a simulation that is nearly as simple
to operate. The main advantage of this formulation, though, is that
we can observe the effects that each biological process has on the
genotype and/or phenotype as a whole, which as we will see has
several benefits when simulating natural selection. We will also
see that, with this formulation, we are not constrained to modeling non-overlapping generations, nor are we constrained to using
fitness values that are constant over time when we are simulating
natural selection.
The majority of this paper will involve exploring the effects of
each function and what each function can simulate. Each section
in which a function is introduced will be followed by an example
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
∗
Tel.: +1 507 272 3874.
E-mail address: [email protected]
of the effects of that function on an organism’s representation, culminating in an example simulation of a small population involving
every function that has been presented. Before the particular functions are presented, though, some notation needs to be introduced.
29
1.1. Notation
33
This formulation uses square diagonal matrices with entries
from Zn (the ring of integers modulo n) where n will depend on
the complexity of an organism’s phenotype expression (as we will
see later). And the functions involved will operate on these matrices with matrix addition and multiplication (there will also be one
action involving a calculation of the trace of a matrix).
All variables in italics used in this paper represent integers, so it
will be automatically assumed and not specified that they are integers whenever a new variable is introduced; likewise, all matrices
will be represented by boldface variables and will not necessarily
be specified as matrices when they are introduced.
We will begin by distinguishing between two different matrix
types.
The first type of matrix, the genotype matrix, will be used to
represent the genotype of an organism; in particular, each position on the diagonal of a genotype matrix will represent one allele
from that organism’s genotype (and genotype matrices can either
be used to represent an organism’s total genotype or a section of
it).
The second type of matrix we will call a phenotype matrix. The
phenotype matrix will initially be constructed from operations on
an organism’s genotype matrices but, as we will see, its entries can
also be altered by other functions. Accordingly, each entry along
the diagonal of a phenotype matrix will represent one phenotypic
trait that is either the expression of the organism’s genotype, the
34
http://dx.doi.org/10.1016/j.mbs.2016.02.005
0025-5564/© 2016 Elsevier Inc. All rights reserved.
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
30
31
32
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
ARTICLE IN PRESS
JID: MBS
2
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
Fig. 1. Expression gate.
Fig. 2. Reproduction gate.
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
[m5G;March 8, 2016;22:1]
result of environmental influences, or some combination of the
two.
The other matrices (which will be presented in later sections)
exist in the environment space which is a system of paths and gates.
Paths are simply the trajectory an organism must follow between
gates (like a wire in an electric circuit); and gates contain functions that act on an organism’s matrices and also determine which
matrices leave on which paths.
Now, when a matrix is added to or multiplied with an organism’s matrix, this invariably produces a new matrix, so according
to the definition of gates there are three things that can happen
to this new matrix: either it leaves the gate it was created in on
the same path as the matrices that entered the gate; it follows a
different path than the matrices that entered the gate; or it does
not leave the gate. Likewise, the matrices that entered the gate can
either leave on the same path, follow different paths, or be prevented from leaving the gate.
Thus we will singularize three different types of gates: expression gates, reproduction gates, and alteration gates. In an expression
gate, an organism’s genotype matrices are operated on to create
a phenotype matrix that leaves the gate it was created in on the
same path as the organism’s genotype matrices (Fig. 1). In a reproduction gate, matrices are generated from operations on the organism’s genotype matrices which then follow a different path than
the organism’s matrices (Fig. 2). And in alteration gates, one of the
organism’s matrices will be operated on to produce a matrix that
leaves the gate on the same path as the organism’s other matrices;
however, the particular matrix that was operated on will not leave
the gate (Fig. 3).
Lastly, we will make the definition that an organism is any set
of matrices that simultaneously enter or leave the same gate so
that a matrix produced in a gate either becomes included in the
set of the organism’s matrices, or it becomes included in the set of
Fig. 3. Alteration gate.
a new organism’s matrices. This means that expression gates create a phenotype matrix which becomes a part of the organism;
reproduction gates leave the original organism’s genotype matrices
unchanged but create matrices for a new organism; and alteration
gates replace one of the organism’s matrices with a new matrix.
In this manner, the action of having an organism enter an expression gate will be used in this formulation to simulate the biological phenomenon of phenotype expression; the action of having
an organism enter a reproduction gate will be used to simulate the
biological phenomenon of reproduction; and the action of having
an organism enter an alteration gate will be used to simulate the
biological phenomenon of mutation.
One final type of gate will be included to represent natural selection. A selection gate will contain a function that assesses the
value of a certain entry in the organism’s phenotype matrix, and
then uses that value to determine whether the organism leaves the
gate; and the path leaving a natural selection gate will always lead
to a reproduction gate or another natural selection gate. So the action of having an organism enter a selection gate will be used to
simulate natural selection.
We can see from the above definitions that an organism might,
for example, be two genotype matrices and a phenotype matrix
that simultaneously follow a path to a gate and then leave that
vertex together on another path to simultaneously enter another
gate, etc.
Now, a true organism really has a genotype for each cell in its
body and a set of genotype and phenotype matrices could conceivably be made for each cell in an organism, but for most cases, we
probably only need to distinguish between an organism’s germ-line
matrices and somatic matrices (matrices representing the genotype
and phenotype of cells that contribute and do not contribute to
gametes respectively). This distinction will come into play in the
reproduction and alteration actions.
From the definition of an organism it is also clear that a population of organisms must contain a collection of paths and gates for
each individual organism since two organisms cannot simultaneously enter the same gate. Thus a population can be represented
by a collection of paths through the environment space that sets
of matrices follow (which is why a population of any significant
size demands a computer). An example involving a small population will be simulated in Section 6.1.
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
1.2. Constructing genotype matrices
133
One reason for choosing to use diagonal matrices in this formulation is so that we can use a mathematical operation that will
act on the entire set of alleles in an organism’s genotype, but
will act on any two alleles if and only if they are interactive alleles (alleles—from different genetic loci or from the same genetic
134
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
92
135
136
137
138
JID: MBS
ARTICLE IN PRESS
[m5G;March 8, 2016;22:1]
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
locus on a different homologous chromosome, etc.—that interact to
express a single phenotypic trait). With diagonal matrices, we can
accomplish this by making a requirement on the arrangement of
alleles within genotype matrices: if two alleles are interactive alleles, then those alleles must be represented by entries in separate
genotype matrices and they must be located in the same position
in their respective matrices.
Let us translate this rule into mathematics. Since diagonal matrices contain all zeroes except on the main diagonal, we can denote them as a = a1 , … , ak without loss of information; thus ai
and aj represent non-interactive alleles for all i = j. And, for two
entries from different matrices, say ax and by , they represent interactive alleles if and only if x = y.
Now, when we recall the nature of matrix addition and multiplication on diagonal matrices, we can see that the requirement
that ax and by represent interactive alleles if and only if x = y results in entries being added or multiplied together if and only
if they represent interactive alleles when genotype matrices are
added or multiplied together. Because, since all off-diagonal entries
in a diagonal matrix are 0, these operations act as follows:
a + b = a1 + b1 , a2 + b2 , . . . , ak + bk ab = a1 b1 , a2 b2 , . . . , ak bk 159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
So, with genotype matrices under the interactive allele requirement, we have two mathematical functions that can operate on the
entire set of alleles in an organism’s genotype and will act on two
alleles if and only if they are interactive alleles.
We should note, though, that this requirement means that even
a haplotypic genotype might require more than one genotype matrix to represent it if at least one of its genes interacts with a gene
at another loci to produce a single phenotypic characteristic. This
also requires that, if a certain gene is involved in the expression of
multiple phenotype characteristics, then that gene’s allele would
need to be represented by multiple entries in genotype matrices.
Yet we can see that any alleles that are not interactive alleles
can be represented in the same genotype matrix. For example, if
in a given pair of homologous chromosomes, every allele on one
chromosome interacts with the allele at the same genetic locus
on the other chromosome (and only that allele), then we would
only need two genotype matrices to represent these alleles and
could represent each allele from one chromosome in one matrix
and each allele from the other chromosome in the other matrix.
And if there were multiple chromosome pairs that fit this pattern,
their alleles could all be represented together in two matrices.
Consequently, the minimum number of genotype matrices required to represent an organism’s genotype is determined by the
maximum number of alleles that interact to express a single phenotypic trait. However, since diagonal matrices must be the same
size to operate on each other, and since some alleles might interact with less alleles than other alleles do, there might be “empty”
spaces in the “extra” genotype matrices that must be filled with an
identity element (which are entries that do not represent an allele
and, we will see in the next section, are either 1 or 0).
To simplify this and later discussions, we will denote interactive alleles that are at the same genetic locus as interactive pairs;
and we will denote alleles for which there is no other allele at the
same genetic locus (like alleles on the Y chromosome) as single alleles. We will also denote genotype matrices as g = g1 , … , gk and
index them as g1 , … , gh .
Let us first consider the case of a diploid organism whose genotype contains only interactive pairs (no singles) that do not interact
with any other alleles (in other words, every trait is a Mendelian
trait). According to the interactive allele requirement, gx i and gy j
can represent interactive alleles if and only if x = y, so we need at
least two genotype matrices. But since each interactive pair does
not interact with any other alleles, we can represent all interactive
3
pairs in the same two genotype matrices.
202
g = g1 , . . . , ga g2 = g1 2 , . . . , gb 2 1
1
1
Let us suppose now that we have an organism whose genotype
still contains only interactive pairs (no singles), but that some of
them interact with other interactive pairs. This could be a diploid
organism that contains traits whose expression involves the interaction of alleles at multiple genetic loci or perhaps a 2m-ploid organism of any value of m.
In this case, we will need multiple pairs of genotype matrices,
by the same reasoning as above. However, if some interactive pairs
interact with more pairs than others, then since diagonal matrices must be the same size to operate on each other, there will be
“empty” spaces in some of the genotype matrices, which must be
filled with an identity element.
For example, if just one of the interactive pairs interacted with
one other interactive pair and we placed these pairs at g2 in each
genotype matrix, then we could put every other interactive pair
in the first two genotype matrices and fill the other two matrices
with the identity element appropriate for that position (for now
we will denote the identity element at position e as μe and we
will see the reason for this choice in the next section).
g = g1 , g2 , g3 , . . . , ga
1
1
1
1
1
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
g2 = g1 2 , g2 2 , g3 2 , . . . , ga 2 μ1 , g 2 3 , μ3 , . . . , μa g4 = μ1 , g2 4 , μ3 , . . . , μa g3 =
We can see that this will not violate the interactive allele requirement, since these identity elements do not represent alleles,
and also ensure that every genotype matrix is the same size.
(Alternatively, we could separate alleles into more genotype
matrices so that all interactive pairs that did not interact with any
other interactive pairs are in two matrices; all interactive pairs that
interact with one other interactive pair are in four matrices; etc. It
is merely a matter of aesthetic preference, and how many matrices
one wants to keep track of.)
Finally, let us consider an organism whose genotype contains
interactive pairs and single alleles, some of which interact. This
could be a diploid organism that contains traits whose expression
involves the interaction of alleles at multiple genetic loci, some of
which might be on a sex chromosome; or perhaps an m-ploid organism for any value of m. This will be the same as the last case,
except for the single alleles; however, we can make single alleles
“interact” with an identity element so that this case will be exactly
the same as the last one and the interactive allele requirement will
still be satisfied if we do this (it will just mean that we need to pay
a little more attention when using the reproduction action as we
will see later).
So, if we build on the last example, we might have something
like the following (where certain entries have been suggestively labeled as x and y in order to show an example arrangement of alleles from X and Y chromosomes).
g = g 1 , g 2 , g 3 , . . . , μa , . . . , μc , y 1 , . . . , y e
1
1
1
1
1
1
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
g2 = g1 2 , g2 2 , g3 2 , . . . , x1 2 , . . . , xb 2 , μd , . . . , μe g3 = g1 3 , g2 3 , g3 3 , . . . , μa , . . . , μc , μd , . . . , μe g4 = g1 4 , g2 4 , μ3 , . . . , μa , . . . , μc , μd , . . . , μe g5 = g1 5 , μ2 , μ3 , . . . , μa , . . . , μc , μd , . . . , μe g6 =
μ1 , μ2 , μ3 , . . . , μa , . . . , μc , μd , . . . , μe And we can see from this that haploids can also be represented
by pairs of genotype matrices. For example, a haploid organism
whose genotype contains only single alleles that interact with no
other alleles would simply have one matrix containing its alleles
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
203
204
247
248
249
250
ARTICLE IN PRESS
JID: MBS
4
251
252
253
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
and one matrix containing all identity elements. Or, if some of its
single alleles interacted with one other allele, it might look like the
following:
g = g1 , g2 , g3 , g4 , g5 , g6
1
g2 =
254
255
256
257
258
259
260
261
262
263
264
265
1
1
1
1
1
1
μ1 , g 2 2 , g 3 2 , μ4 , g 5 2 , μ6 Consequently, we only need to consider cases involving pairs of
genotype matrices because we can always pair a single allele with
an identity element. Additionally, we might want to include “extra” pairs of genotype matrices containing only identity elements
for each organism if we want to represent certain forms of chromosome mutation like duplication in the same way as the other
forms of alteration. (We will see why in Section 4.)
Furthermore, we might want to include “extra” identity elements in each genotype matrix so that we can add entries without having to change the size of any matrices. There are of course
ways to change the size of each matrix, but this runs the risk of
shifting entries in undesired ways.
266
2. The formulation
267
2.1. Phenotype expression
268
276
Phenotype expression involves having an organism that consists only of genotype matrices enter a phenotype expression gate
which contains a function that acts on those genotype matrices
to produce phenotype matrices; then these phenotype matrices
leave the gate simultaneously on the same path as the genotype
matrices.
To simplify the following discussion, we will first divide phenotype expression into two different types and then combine them
into a general phenotype expression function.
277
2.2. Multiplicative phenotype expression
278
Let us first investigate how to use this formulation to represent Mendel’s Law of Dominance—which, expressed rigorously, is
the relationship between two alleles, A and , in which the interaction of A and A expresses the same trait as the interaction of A
and —along with the first half of the Principle of Segregation—
that each phenotypic characteristic is the expression of two (interactive) alleles.
Clearly the first half of the Principle of Segregation limits us to
just two genotype matrices and it is easy enough to see that these
rules can be encapsulated mathematically as:
269
270
271
272
273
274
275
279
280
281
282
283
284
285
286
287
1 2
g g =p
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
[m5G;March 8, 2016;22:1]
where g1 , g2 contain entries that are elements of Z2 .
This is due to the fact that the only elements of Z2 are 0 and
1 and 0 × 0 ≡ 0 (mod 2) and 0 × 1 ≡ 0 (mod 2), so 0 can represent
the allele denoted as A and 1 can represent the allele denoted as
from above.
Yet we can extend this idea further when we focus on a special
subset of Zn which we will denote Жn . The set Жn will be defined
as {u Zn | uu = u}; in other words, Жn is the set of all multiplicatively idempotent elements in Zn . For example, the set Ж2 = {0,1},
as we just saw; the set Ж6 = {0,1,3,4} because 3 × 3 ≡ 3 (mod 6),
4 × 4 ≡ 4 (mod 6), etc.; and the set Ж30 = {0,1,6,10,15,16,21,25}.
An interesting property of the elements of Жn is that for certain
u, v Жn , uv = u. For example, the numbers 15 and 25 from Ж30 ,
exhibit this property since 15 × 25 ≡ 15 (mod 30) (and trivially any
u multiplied by 1 is congruent to u). The Theorem of Dominance
(proved in the Appendix) tells us that this is the case whenever
the greatest common divisor of v and n also divides u. And it is
also proved in the Appendix that there is an element of Жn for
each unique combination of the distinct prime divisors of n, so the
number of pairs of elements of Жn in which uv = u depends on the
number of prime divisors of n.
Now, since by definition uu = u for all u Жn , then for any u,v
Жn in which uv = u, the phenotype expression action of genotype
entries containing u and v can replicate the relationship between
any two true alleles A and , in which the combination of A and
A expresses the same trait as the combination of A and . Consequently, we will make the following shorthand definition in this
formulation: given two entries u and v, if the result of multiplying
uv equals u, then it will be said that u dominates v.
And we can see that, if n contains enough prime divisors, then
we can construct a dominance hierarchy where w dominates z, v
dominates both w and z, u dominates both v, w and z, etc., which
we will denote as u v w z. This can be achieved if w contains
the greatest common divisor of z and n plus at least one other divisor of n, v contains the greatest common divisor of w and n plus at
least one other divisor of n, u contains the greatest common divisor of v and n plus at least one other divisor of n, etc. For example,
from the set Ж30 , there is the dominance hierarchy 0 15 25 1; and in the set Ж210 , there is 0 105 175 85 1.
To provide a concrete example, we can represent the alleles
that determine mallard duck feather pattern with the hierarchy
105 175 85 from Ж210 . The gene that determines feather pattern contains three alleles usually denoted MR , M, and md and
the interaction of MR with MR produces the same trait (the restricted feather pattern) as the interaction of MR with M or md ;
furthermore, the interaction of M with M produces the same trait
(the mallard feather pattern) as the interaction of M with md ;
and there is a third trait (the dusky feather pattern), that is only
produced by the interaction of md with md [1]. Accordingly, if
105, 175, and 85 represent the alleles MR , M, and md respectively, then we can see that, because 105 × 175 ≡ 105 (mod 210)
and 105 × 85 ≡ 105, the restricted trait can be represented by 105;
and because 175 × 85 ≡ 175, the mallard trait can be represented
by 175; while the dusky trait can be represented by 85.
Before we proceed further, let us define two elements, a and b,
of a ring as extraneously prime if a contains a prime factor of n that
is not in b and contrariwise. So, for example, if n = 30, then 6 and
15 and 10 are all extraneously prime to each other and 16 and 21
are also extraneously prime to each other (but 15 and 25 are not).
Clearly from this definition, for any u,v Жn that are extraneously prime, the greatest common divisor of v and n does not divide u and the greatest common divisor of u and n does not divide
v, therefore, by the Theorem of Dominance, u does not dominate v
and v does not dominate u so uv ≡ w (mod n) (where u ࣖ v ࣖ w).
In this case, the phenotype expression action of genotype entries
containing u and v can replicate the relationship between any true
alleles A and B, in which the combination of A and B expresses a
trait that is different from the trait expressed by A and A and the
trait expressed by B and B; thus, we can use extraneously prime
entries to represent traits that are the expression of alleles with a
co-dominant relationship—which, expressed rigorously, is the relationship between two alleles in which the interaction of each combination of alleles expresses a different trait.
Additionally, if n contains enough prime divisors, we can also
construct multiple hierarchies in which the members of each hierarchy are extraneously prime to the members of every other hierarchy. For example, if a population contains the set of entries {106,
36, 175, 85} from Ж210 at a given position, then we have the dominance hierarchies 36 106 and 175 85 and the members of
each hierarchy are extraneously prime to the members of the other
hierarchy.
As a concrete example of this, we can represent the alleles that
determine ABO blood type with the hierarchies 15 25 and 10 25 from Ж30 . The gene that determines ABO blood type contains
three alleles often denoted IA , IB , and i and the interaction of IA
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
ARTICLE IN PRESS
JID: MBS
[m5G;March 8, 2016;22:1]
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
with IA produces the same trait (the A antigen) as the interaction
of IA with i; likewise, the interaction of IB with IB produces the
same trait (the B antigen) as the interaction of IB with i; however,
the interaction of IA with IB produces both the A and the B antigens on the blood cell; and when i interacts with i, no antigens
are produced [1]. Accordingly, if 15, 10, 25 represent the alleles IA ,
IB , and i respectively, then we can see that, because 15 × 25 ≡ 15
(mod 30) the lone A antigen trait can be represented by 15; because 10 × 25 ≡ 10, the lone B antigen trait can be represented by
10; and because 15 × 10 ≡ 0, the A and B antigen trait can be represented by 0; while the absence of an antigen can be represented
by 25.
The fact that elements of Жn can represent these dominance
and co-dominance relationships is the main reason for choosing
to use elements of Жn , but another reason is so that we can represent complete dominance, where the expression of an allele by
itself produces the same trait as the expression of that allele interacting with another copy of itself. For example, an allele on the X
chromosome that produces the same trait when it interacts with
the same allele on the other X chromosome in females as it does
when it does not interact with any other alleles in males (who lack
a second X chromosome). Clearly, any element of Жn can represent
such alleles since they are all idempotent elements.
On the other hand, the elements of Zn – Жn can be used to represent alleles in which the expression of that allele by itself produces a different trait than the expression of that allele interacting
with another copy of itself, since these elements are by definition
not multiplicatively idempotent elements.
Consequently, there are elements of Zn – Жn that can interact
with the elements of Жn to represent a haploinsufficient relationship between alleles, where the expression of A alone produces the
same trait as the expression of A with but is different from the
expression of A with A and the expression of with (which are
both different).
The Divisor Lemma (proved in the Appendix), shows that for
all u Жn , if gcd(u,n) = δ , then δ u ≡ δ (mod n). So for u = δδ , we
can represent this relationship between alleles using u and δ . For
example, in Z30 , 2 × 16 ≡ 2 (mod 30), but 2 × 2 ≡ 4 and 16 × 16 ≡ 16.
In Section 6.3, we will look at some possible relationships between
alleles that the elements of Zn – Жn can also be used to represent.
Let us now extend the expression function beyond the Principal of Segregation to include organisms whose genotype must be
represented by multiple pairs of genotype matrices. For this case,
let us investigate the following multiplicative phenotype expression
function:
h
E g1 , . . . , g
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
=
gh = p
Suppose we have a population in which organisms contain the
entries {a, d} at ge 1 and ge 2 and the entries {b, c} at ge 3 and ge 4 ,
where a b c d (and all other genotype matrices, if any, contain a 1 at this position). Since a dominates every other entry, pe
will equal a for any combination containing a; since b dominates
c and d, pe will equal b for any combination containing b but not
a; and since c dominates d, pe will equal c for any combination
that doesn’t contain a and b. So when every distinct combination
of these entries is calculated, this function will produce the entries
pe = a, pe = b, and pe = c in the ratio 12:3:1.
This can replicate the relationship found in dominant epistasis
where for four true alleles A, B, , and , the combination of A
and A expresses the same trait as the combination of A with any
other allele; the combination of and with B and B expresses
a different trait than the first but one that is the same as the trait
expressed by the combination of and with B and ; and the
combination of and with and produces a trait different
from the other two traits.
5
And it’s not too difficult to see that if organisms contain the entries {a, d} at both ge 1 , ge 2 and ge 3 , ge 4 , this function will produce
the entries pe = a, pe = d in the ratio 15:1 (that of complimentary
epistasis) since pe = d if and only if all four entries are d. (We will
investigate representing other types of epistasis in Section 6.)
So we can see that representing alleles with elements of Жn
extends Mendel’s Law of Dominance to include co-dominance and
also allows for the representation of genes with more than two
variations that interact in various combinations of dominance and
co-dominance. Additionally, certain elements of Zn – Жn paired
with elements of Жn can be used to represent haploinsufficiency.
2.3. Additive phenotype expression
Additive phenotype expression will be used to represent the expression of alleles that give rise to traits that differ in some measurable way (like height, litter size, etc.). Specifically, it will be
used to represent genes whose variations differ only in the quantity of some substance that the gene contributes to the characteristic (like pigment, growth hormone, etc.) such that a difference in
the quantity contributed by a gene will cause there to be a measurable difference in the characteristic.
We will use the following function to represent this type of
phenotype expression:
1
h
E g ,...,g
=
ge h –
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
g =p
458
459
ge h Now if all ge 1 , … , ge h and ge 1 ’, … , ge h ’ are equal except ge i and
ge where ge i – ge i ’ = , then ge h – ge h ’ = ge h – ge h + . Consequently, pe – pe ’ = and a difference in the value of two genotype
entries will cause an equal difference in the phenotype entries they
express.
Since this difference is equal, genotype entries must be chosen
according to the affect the alleles they represent have on the trait
they express. For instance, if in a certain population two interactive pairs each contribute 0, 1, or 2 doses of some substance that
influences a plant’s height, but each dose from one pair causes
the plant to grow twice as tall as the doses from the other pair,
then the set of entries representing each pair cannot be equal even
though the number of doses is equal. However, we can see that
there is some latitude in the entries we choose, since their differences are invariant when the same number is added to every entry
at that position. So we could choose the two sets to be {0,1,2} and
{0,2,4}, but if the phenotype entry they express represents a plant’s
height that varies between 10 and 16 cm it might be more favorable to use {5,6,7} and {5,7,9} instead. In this way, then, this action
can represent the expression of alleles that contribute to traits in
a measurable way.
Now, certain traits (like height) are usually described as varying “continuously” due to environmental influences (a better definition, that avoids any pitfalls of limitlessness, would be to say
that these are traits in which the more precisely one measures, the
more varieties there are to be found). In other words, traits that
differ in measurable ways can sometimes be found to have values
different from the values produced strictly by the expression of an
organism’s genotype when these traits are influenced by the environment.
Since phenotype matrices can only contain integers, we need
a way for the alteration action (to be introduced in Section 4) to
change an entry to a value different from the values produced by
the phenotype expression function. This can be accomplished by
multiplying each genotype entry at a certain position (for every organism in the population) by a constant of measure, κ . So, instead
i’
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
437
438
h
So the difference between two traits pe and pe ’ arising from different genotypes is:
pe –pe =
436
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
ARTICLE IN PRESS
JID: MBS
6
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
510
of containing {a, b, … , z}, the set of genotype entries in the population would be {κ a, κ b, … , κ z}. This also means that the values that the phenotype entries represent must be scaled accordingly in different units since differences are not invariant under
multiplication.
For instance, if in the above example we use κ = 10, then each
entry in the phenotype matrix should represent the plant’s height
in millimeters instead of centimeters because the entries in the
phenotype matrix would vary between 100 and 160. And phenotype expression will still only produce entries of 100, 110, 120, 130,
140, 150, and 160 mm; but, with the alteration action, these entries
can also be changed to assume values like 101, 115, 137, etc. Thus
the constant of measure should be selected with a large enough
value to encompass the number of varieties that can be found with
the given precision of measurement.
511
2.4. The general phenotype expression function
512
We can either require that all multiplicative genotype entries
be in separate matrices from additive genotype entries, or we can
combine them in the same matrices and combine the two types of
phenotype expression into the following function:
496
497
498
499
500
501
502
503
504
505
506
507
508
509
513
514
515
E g1 , . . . , gh =
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
gh μ +
gh α = p
with the constraint that μ + α = I (where I is the identity matrix and μ and α are binary (containing only 1 s and 0 s) diagonal matrices). Since this constraint means that everywhere μ has
a 1, α has a 0 and contrariwise (on the diagonal), all entries in
gi are acted on by one (and only one) of the two types of phenotype expression, because if α e = 0, then the equation will be
h
ge + 0 = pe and, similarly, if μe = 0, then the equation will be
0 + ge h = pe .
Now, it has been mentioned a few times that certain positions
must be filled with identity elements if we want to limit the number of matrices required to represent an organism’s genotype. It
is clear now that, for positions in which the entries interact ac h
cording to
g , the appropriate identity element to use is 1, the
multiplicative identity element of Zn ; and, for positions in which
the entries interact according to gh , the appropriate identity element to use is 0, the additive identity element of Zn . It is also
easy enough to see that the appropriate identity element for any
ge i is equal to μe .
To demonstrate this combined action, let us suppose that an
organism enters a phenotype expression gate containing the above
function and the following two matrices:
μ = 1 , 1 , 0 , 0 , 0 , 1 , 0 α = 0 , 0 , 1 , 1 , 1 , 0 , 1 537
538
And the organism will be a set of the following genotype matrices:
g1 = 85, 106, 10, 10, 10, 15, 6
g = 85, 175, 10, 10, 30, 21, 12
2
g3 = 105, 1, 60, 20, 20, 141, 0
g4 = 175, 1, 30, 10, 40, 1, 0
539
540
Then, the following phenotype matrix will be produced in this
gate:
p = 0, 70, 110, 50, 100, 105, 18
541
3. Reproduction
542
As stated in the introduction, the action of having an organism
enter a reproduction gate will be used to simulate reproduction.
Now, since sexual reproduction in diploid organisms is the creation
543
544
[m5G;March 8, 2016;22:1]
Fig. 4. Sexual reproduction.
of a new organism through the fusion of gametes which contain
a copy of one allele from each genetic locus in their progenitors’
genotypes, this requires an action that creates genotype matrices
for a new (temporary) organism that contain a copy of one entry
from each position for each pair of a progenitor organism’s germline genotype matrices and then these genotype matrices must be
combined with those of another (temporary) organism that was
created in the same way.
And since asexual reproduction is the creation of a new organism which contains copies of the alleles from one progenitor, we
need an action that will copy entries from a progenitor organism’s
genotype matrices in a similar way, but will not combine with
those of another organism.
This action will involve reproduction gates which contain a
reproduction function and recombination matrices that act on an
organism’s genotype matrices to produce new matrices. Recombination matrices will be used to determine what entries are
copied from the progenitor organism’s genotype matrices (and also
whether there is genetic recombination) to the new matrices. And
these newly produced matrices will leave the gate on a different
path from the matrices of the organism that entered the gate and
proceed to a phenotype expression gate (although they can also
encounter alteration gates in between) where they become the
genotype matrices for a new organism.
Furthermore, for representing reproduction involving two progenitors, each organism is acted on in their own reproduction gate
to create matrices that then proceed to the same phenotype expression gate (Fig. 4). In this case, even though the new matrices
came from different gates, they will by definition become one organism, since they will simultaneously enter the same gate. And
they will therefore all be used to express the phenotype of this
new organism.
So, in this formulation, when genotype matrices created from
multiple organisms enter the same gate (and thus combine into
one organism) we will use this action to represent the fusion of
gametes and when matrices from only one organism proceed to a
gate that expresses its phenotype we will use this action to represent asexual reproduction.
Let us first consider the creation of a gamete by a diploid organism whose genotype contains only interactive pairs; in other
words, a genotype consisting solely of Mendelian traits. Such a
genotype can consequently be represented by two genotype matrices. Now, according to Mendel’s Law of Segregation, an organism
will contribute one of the alleles from each locus to the gamete,
so an operation on the organism’s genotype matrices representing this should result in one matrix that contains one of the two
entries from the organism’s genotype matrices for each position
along the diagonal.
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
ARTICLE IN PRESS
JID: MBS
[m5G;March 8, 2016;22:1]
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
593
594
595
596
If, for the two genotype matrices, we construct two binary diagonal recombination matrices r1 and r2 and define the relation
between the two recombination matrices as r1 + r2 = I, we can use
the following function to accomplish this:
1
2
R g ,g
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
1 1
∗
2 2
=g r +g r =g
The requirement that r1 + r2 = I, which we will denote recombination entry symmetry, means that everywhere r1 has a 1, r2 has a
0 and contrariwise (on the diagonal). It follows then that the sum
g1 r1 + g2 r2 will equal a matrix in which the entry at the jth locus is either gj 1 or gj 2 because, in the expression gj 1 rj 1 + gj 2 rj 2
either gj 1 rj 1 = gj 1 and gj 2 rj 2 = 0 or gj 1 rj 1 = 0 and gj 2 rj 2 = gj 2 . Therefore, this operation will copy the entry from one or the other
of the genotype matrices into a single matrix for each position
along the diagonal for this case. Thus, the reproduction action with
recombination entry symmetry acting on two genotype matrices
can mathematically express Mendel’s Law of Segregation acting on
Mendelian traits.
Let us now deal with the more general case of an organism
whose genotype must be represented by multiple pairs of genotype matrices. Now, since, for interactive pairs, each matrix pair
represents the alleles from a single genetic locus and we want a
copy of one allele from each genetic locus, the operation on each
pair of genotype matrices should still result in one matrix that
contains one of the two entries from the pair for each position
along the diagonal; however, since single alleles can be used to
represent alleles on sex chromosomes, we may need extra constraints placed on the entries in recombination matrices since sex
chromosomes often exhibit genetic linkage in one gender.
In general, then, we can use recombination matrices that exhibit recombination entry symmetry in pairs, rη + rη+ 1 = I (where
η is all odd values of the sequence 1, 2, … , h), and the general
reproduction function:
1
1
1
Rη (gη , gη+ ) = gη rη + gη+ r η+ = gη∗
624
625
626
627
628
629
630
631
632
633
634
635
636
By the same logic as before, it follows that the sum
gη rη + gη+ 1 rη+ 1 will equal a matrix in which the entry at the jth
locus is either g η or g η+ 1 . Therefore, for each pair of genotype
j
j
matrices, this operation will copy the entry from one of the matrices for each position along the diagonal and will result in a single genotype matrix for each pair of genotype matrices for this
case. (And clearly, the first case was a special case of this one with
η = 1.) However, we will need to put extra constraints on the recombination matrices in order for them to represent genetic linkage.
For example, suppose we had the following genotype matrices
(where entries denoted xi are entries from an X chromosome and
entries denoted yj are entries from a Y chromosome):
g1 = g1 1 , g2 1 , g3 1 , . . . , μa , . . . . . . , μb , y1 1 , . . . , yc 1 g2 = g1 2 , g2 2 , g3 2 , . . . , x1 2 , . . . . . . , xb 2 , μb+ 1 , . . . , μc 637
638
639
640
641
where x1 begins at ga and ends at gb and y1 begins at gb+ 1 and
ends at gc . If we constrain the entries at these positions in the recombination matrices to be equal, ra i = ra+ 1 i = ra+ 2 i = … = rc i , it is
evident that the reproduction function will create a g∗ matrix containing either:
. . . , μa , . . . , μb , . . . , y1 1 , . . . , yc 1 or . . . , x1 2 , . . . , xb 2 , . . . ,
× μb+ 1 , . . . , μc 642
643
644
645
646
647
And if an allele from one of the sex chromosomes interacts with
an interactive pair or a single allele (and is therefore in a different
matrix than the other alleles from its chromosome), we need to be
sure to constrain the appropriate entry in the appropriate recombination matrix. In this way, then, we can represent the transference
of all alleles that are transferred with no genetic recombination
from one or the other of the two sex chromosomes to a gamete
in organisms that contain different types of sex chromosomes.
From this we can see that whether or not this action represents the occurrence of genetic recombination depends on the relation between each entry in the recombination matrices (and on
whether alleles from the same chromosome are represented by entries in the same genotype matrices). Because, for any ra i and rb i ,
ra i + rb i can equal 0, 1, or 2 (since every re i is either 1 or 0), so if
alleles that came from the same chromosome are represented by
entries in the same genotype matrix, values of 0 and 2 represent
the occurrence of no recombination while a value of 1 represents
the occurrence of genetic recombination.
We can also see from this that the reproduction function under
recombination entry symmetry for each pair of matrices can represent the production of an organism that contains a copy of one or
the other of the alleles from each genetic locus of its progenitors.
Yet, let us consider what happens in the simple case of two
organisms with two genotype matrices each giving an exact copy
of one of their matrices to form another organism. If their alleles were not systematically assigned to specific positions in their
genotype matrices, then the requirement for interactive alleles
might be violated when the new organism enters a gate that expresses its phenotype.
Thus we need a species requirement: if two organisms in this
formulation represent true organisms of the same species, then
each position in their genotype matrices must contain an entry
representing an allele from the same gene. Consequently, since the
reproduction action copies entries from genotype matrices into the
same position in a new matrix, new organisms constructed from
organisms complying with the species requirement will not violate
the interactive allele requirement.
And, to comply with these requirements, when representing the
fusion of gametes, for each value of η, the gη ∗ that were contributed from one of the paths should become gη and the gη ∗ that
were contributed from the other path should become gη+ 1 when
they enter the subsequent phenotype expression gate.
Another aspect of the species requirement should be that if
two organisms in this formulation represent true organisms of the
same species, then each position in their genotype matrices must
also contain an entry from Zn for the same n. (Because adding or
multiplying numbers from Zn and Zm is undefined if n does not
equal m.)
In this way, then, the reproduction action can represent the creation of a new organism through the fusion of gametes which contain a copy of one allele from each genetic locus in their progenitors’ genotypes. Additionally, this formulation allows us to study
the effects of overlapping generations since reproduction gates
only need to occur simultaneously for organisms that are mating
together. Thus, the organisms of each generation can enter reproduction gates at different times from each other.
For asexual reproduction, we need to use a slightly modified
reproduction function:
h
R g
h
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
h∗
=g I=g
Clearly this function just makes copies of each genotype matrix
and then, if the path the newly created g∗ matrices follow is the
only path leading to a subsequent expression gate, this can represent the creation of a new organism through asexual reproduction
since it will represent the creation of an organism that inherits its
alleles from a single progenitor.
To demonstrate the effect of recombination matrices, let us
have two identical organisms enter simultaneous reproduction
gates containing different recombination matrices, using the same
organism as before:
g1 = 85, 106, 10, 10, 10, 15, 6
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
7
700
701
702
703
704
705
706
707
708
709
ARTICLE IN PRESS
JID: MBS
8
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
g2 = 85, 175, 10, 10, 30, 21, 12
g = 105, 1, 60, 20, 20, 141, 0
3
g4 = 175, 1, 30, 10, 40, 1, 0
For one organism, we will use the recombination matrices:
710
r 1 = 0 , 1 , 0 , 0 , 0 , 1 , 0 r 2 = 1 , 0 , 1 , 1 , 1 , 0 , 1 r 3 = 1 , 0 , 1 , 0 , 1 , 0 , 1 So performing g1 r1 + g2 r2 = g1 ∗ and g3 r3 + g4 r4 = g3 ∗ results
711
in:
g1∗ = 85, 106, 10, 10, 30, 15, 12
g3∗ = 105, 1, 60, 10, 20, 1, 0
And for the other organism, we will use:
713
r 1 = 0 , 1 , 1 , 0 , 0 , 0 , 1 r 2 = 1 , 0 , 0 , 1 , 1 , 1 , 0 r 3 = 0 , 0 , 0 , 1 , 0 , 1 , 1 r 4 = 1 , 1 , 1 , 0 , 1 , 0 , 0 So performing
714
715
g1 r1
= g2 ∗
and
g3 r3
+ g4 r4
= g4 ∗
results
in:
g4∗ = 175, 1, 30, 20, 40, 141, 0
717
Thus, the new organism will have the following phenotype matrix:
p = 105, 106, 110, 50, 120, 105, 18
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
In the alteration action, a new matrix is created that will replace the matrix it was produced from; yet the reason for this replacement is so that we can change or exchange particular entries
in the organism’s matrices, so we need a method of copying entries similar to that of the last section to keep all but the entries
being altered constant.
To represent the various different kinds of mutation will require
two types of alteration matrices: the first we will refer to as alteration matrices and the second as translocation matrices. We will
first investigate alteration matrices.
4.1. Changes in variation and deletion
+ g2 r2
g2∗ = 85, 106, 10, 10, 30, 21, 6
716
have different causes, we will specify two different kinds of genetic drift: gametic genetic drift, where a break in recombination
interval symmetry causes a change in allele frequency; and reductive genetic drift, where the parents of the next generation are a
subset of the children of the previous generation and the new parents’ allele frequencies are not a uniform scaling of that of the children of the previous generation. The example in Section 6 will also
demonstrate these two kinds of genetic drift.
4. Environmental alteration
r 4 = 0 , 1 , 0 , 1 , 0 , 1 , 0 712
[m5G;March 8, 2016;22:1]
First of all, we want an action that uses alteration matrices that
do not have to depend on the entries in the matrix they are altering, because if the entries in alteration matrices depend on the
entries in the matrix they are altering, then this action will have
no predictive power and we would essentially just be manually replacing the matrices.
Therefore we will define a pair of alteration matrices, a and ã,
that exist in alteration gates and act in what may seem like an
overly complicated expression at first:
Which contains several entries that differ, due to recombination, from the identical phenotype matrices of the parents:
A ( q ) = qa + ∼ a = q
p = 0, 70, 110, 50, 100, 105, 18
(where q is either a phenotype matrix p, a genotype matrix gi , or
a gi∗ matrix).
It follows that the change in any particular entry is qa ’ –
qa = ãa – qa (1 – aa ), so whenever ãa = qa (1 – aa ) there will be a
change in the value of qa . Evidently, in order to keep all but the
entries being altered constant, a must contain all 1’s and ã must
contain all 0’s except at the positions of entries being altered.
The main reason for choosing the above function is because Жn
is not always a ring, so if we want the alteration function to only
produce values that are elements of Жn (and have entries that do
not depend on the entries in the matrix they are altering) then we
need a more complicated expression than just q’ = q + a or q’ = qa.
The Alteration Proposition (proved in the Appendix) shows that qa ’
will be an element of Жn as long as aa + ãa Жn and ãa Жn , so
the environment space can contain any a and ã that satisfy these
conditions when we want the alteration function to only produce
values that are elements of Жn . For the case of additive alleles,
though, since Zn is closed under modular addition and multiplication, qa aa + ãa will be an element of Zn for any aa and ãa .
As stated in the introduction, the matrix represented by q will
enter the gate and act in the above function to produce q’ but will
not leave the gate; instead q’ will replace q in the organism’s set of
matrices. Accordingly, to represent factors that lead to a change of
variation in alleles or traits, alteration matrices will be used in this
way to change entries in a genotype or phenotype matrix while
keeping the other entries constant; specifically, each entry in an
alteration matrix that is not the identity will represent an environmental effect that causes a mutation of a particular allele or trait.
To explore the effect of this action, let us first suppose that, in
a given population, the set of all entries at qa is a subset of Zn . If
v is not in the set of entries at qa but u is, then this action on the
Consider now a space which contains a reproduction gate with
the recombination matrices r1 , … , rh followed by another reproduction gate at some time later that contains the recombination
matrices r’1 , … , r’h where each pair of recombination matrices
is under recombination entry symmetry. For a given position e, if
re i + r’e i = 1, it follows that when ge i ∗ contains ge i , then g’e i ∗ contains ge i ±1 and contrariwise.
We can see that this is the case, because for each pair of
recombination matrices under these two symmetries, rη = r’η+ 1
and rη+ 1 = r’η (the e subscript is suppressed for readability), since
rη = 1 – rη+ 1 (due to recombination entry symmetry) and rη = 1 –
r’η (the assumption from above) so 1 – rη+ 1 = 1 – r’η and hence
rη+ 1 = r’η . Making the opposite substitution shows that rη = r’η+ 1 .
(Hence if re η + r’e η = 1, then re η + 1 + r’e η + 1 = 1 and contrariwise.)
We will denote re i + r’e i = 1 as recombination interval symmetry for
the position e.
So, in a space that has two reproduction gates containing a pair
of recombination matrices under both symmetries for a certain
position, any organism that proceeds through these reproduction
gates will contribute each of the entries from the corresponding
pair of genotype matrices to a g∗ matrix because the entry that is
not contributed in the first breeding gate is contributed in the second and contrariwise. Therefore, crossover symmetry in the reproduction action conserves allele frequencies between organisms and
the g∗ matrices they produce. When symmetry is broken, genetic
drift is possible because the g∗ matrices each organism produces
will contain a random sample of their alleles.
We will see an example involving both recombination interval symmetry and genetic drift in Section 6.1. But, because they
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
ARTICLE IN PRESS
JID: MBS
[m5G;March 8, 2016;22:1]
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
809
810
811
812
813
entry u with ãa = u – v(1 – aa ) will change u to v. Clearly then the
alteration action with certain values of aa and ãa can produce qa ’
that were not in the original set of entries.
For example, suppose we have a population of 4 organisms with
the following genotypes:
g1 = 15, 25, 21 25, 25, 15 15, 25, 15 25, 25, 21
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
So the set of all entries for the first position in this population
is {15, 25}. If we choose the alteration matrices a = 5, 1, 1 and
ã = 10, 0, 0, this will simply flip each 15 to 25 and each 25 to
15, which might change the allele frequencies, but will leave the
set of all entries unchanged. However, if any of the organisms are
acted on by a = 15, 1, 1 and ã = 1, 0, 0, a new entry, 16, will
be introduced to the set of all entries for the first position in this
population making it {15, 25, 16}.
Another way this action can be used is to have aa = 0 and
ãa = μa (where μa is the identity element for that position) operate on ga i or ga ∗ . This can represent a mutation that results in the
deletion or malfunction of a gene since that entry will no longer
influence the phenotype entry.
So we can see that the alteration action on a genotype matrix
can represent the mutation of a gene into a different variation or
the deletion of a gene. And if the alteration action operates on a
germ-line genotype matrix or on a g∗ matrix itself, this can represent the introduction of a new genetic variant in a population of
true organisms (we will see an example of this in Section 6.1).
Furthermore, the alteration action on a phenotype matrix can
represent the alteration of a trait to one that is not the expression
of its genotype; and it can also represent the alteration of a trait
to one that is not the expression of the genotype of any organism
in the population for certain values of a and ã.
For example, if we recall the example involving additive alleles
that have been multiplied by a constant of measure, resulting in
entries in the phenotype matrix that vary between 100 and 160,
we can see that using aa = 1 and ãa equal to any number between
0 and 10 will result in an entry that is not the expression of the
genotype of any organism in the population. Clearly, then, the alteration action can cause this phenotype entry to assume any integer value from 100 to 160.
846
847
848
849
850
851
852
853
854
855
856
857
0
0
0
0
1
0
0
0
1
a
0
0
0
b
0
0
0
0 + 1
c
0
1
0
0
0
0
1
1
0
0
0
0
0
0
0
1
d
0
0
0
e
0
0
0
0 + 1
f
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
In this case, a genotype matrix will again be acted on and then
replaced by the matrix produced from this action, but we will instead use translocation matrices, t, which can be any permutation
matrix (a square binary matrix where each row and column contain all 0 s except for one entry) that is symmetric.
For example:
g2 = 25, 21, 15 15, 25, 21 15, 25, 21 25, 25, 15
0
0
1
1
0
0
0
0
0
1
or
0
1
0
1
0
0
⎤
0
0
0
1
⎡
0
⎢0
or ⎣
0
1
0
1
0
0
0
0
1
0
⎤
860
861
862
863
0
0⎥
1⎥
⎦
0
864
A(g ) = tgt = g .
Since t is symmetric, tij = tji , so for any tai and tib in which a = b,
tai gii tib = 0 because one of tai and tib equals 0; likewise for any tai
and tja in which i = j, tai gij tja = 0 because one of tai and tja equals
0. (And gij also equals zero whenever i = j.) This means that a
b tia gab tbj = 0 whenever a = b and i = j, so the only non-zero entries are: a a tia gaa tai . It follows then that, for each tia = tai = 1,
the entry at gaa will be transferred to g’ii (and if i = a there is no
movement).
This action can therefore translocate entries in a single matrix.
For example, using the first two matrices from above and
g = a,b,c we can see that g’ will equal b,a,c and c,b,a; and using the second two matrices from above and g = a,b,c,d we can
see that g’ will equal d,b,c,a and b,a,d,c.
Now, to translocate entries from different genotype matrices,
the action is more complicated. We need to use the following expressions to exchange entries between gi and gj :
i
A g
j
i
= g + t g t = g andA g
1 i
2 j
= g + t g t = g
2 j
1 i
where i and ϶i are binary diagonal matrices and
1 + ϶1 = 2 + ϶2 = 1 + t϶2 t = 2 + t϶1 t = I.
It is perhaps more instructive to begin with an example in this
case. In the following, a and e exchange places in the matrices
gi = a,b,c and gj = d,e,f:
a
0
0
0
b
0
0
0
c
0
1
0
1
0
0
0
0
1
0
1
0
1
0
0
0
0
1
=
=
e
0
0
0
b
0
0
0
c
d
0
0
0
a
0
0
0
f
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
j
881
882
883
884
885
It follows that multiplying 1 gi results in a matrix whose diagonal either contains ge i or 0 and that multiplying 2 gj results in
a matrix whose diagonal either contains ge j or 0 since 1 , ϶2 are
just identity matrices with a few 1 s replaced with 0 s; and we
know from above that t϶2 gj t will simply move entries around on
the diagonal, so the products 1 gi and t϶2 gj t both produce diagonal
matrices that contain either ge i , ge j , or 0 along their diagonal. Furthermore, since 1 + t϶2 t = I, this means that whenever [1 gi ]e = ge i ,
[t϶2 gj t]e will equal 0 and whenever [1 gi ]e = 0, [t϶2 gj t]e will equal
ge j , so for each position where 1 contains a 1, the diagonal will
contain the entry from gi and for each position where 1 contains
a 0 (on the diagonal) it will contain the entry from gj . The same
reasoning applies to 2 gj + t϶1 gi t = gj ’.
These functions can therefore translocate entries in or between
an organism’s genotype matrices and, when the genotype matrices
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
858
859
1
0⎥
or
0⎦
0
And the structure alteration function will be:
0
0
f
Besides changing particular alleles or traits to other variants,
mutations can also change the structure of chromosomes by duplicating sections, inverting sections, translocating sections, etc. However, since the position of each entry in a genotype matrix is
determined by the interactive allele requirement and the species
requirement and does not necessarily correspond to the alleles’ position in a chromosome, our concern is with how these types of
mutation alter the combinations of interactive alleles in a certain
organism and/or alter the position of alleles in such a way that the
combinations of interactive alleles in an organism’s offspring will
be altered.
1
0
0
⎢1
× ⎢0
⎣
0
0
e
0
4.2. Translocation and duplication
0
1
0
⎡0
d
0
0
9
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
ARTICLE IN PRESS
JID: MBS
10
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
being acted on are replaced by the newly constructed matrices, we
can use this action to represent phenomena in which pairs of alleles in a true organism’s genotype are affected in such a way that
they interact with new alleles and/or phenomena in which pairs
of genes are affected in such a way that that they are no longer
located at the same locus as they are in the rest of the organisms
of that species.
If we remember back to Section 1.2, it was suggested that “extra” pairs of genotype matrices containing only identity elements
be included in the organism’s set of matrices and that “extra” identity elements be included in each genotype matrix so that we can
add entries without having to change the size of any matrices.
We can use these “empty” positions to duplicate entries and insert them in other matrices by using the above function to form
only one new matrix. In other words, only the gi matrix will be
replaced and entries from gj will be copied into gi ’ in the expression:
i
A g
i
= g + t g t = g
i
j
928
For example, if g7 is one of the “extra” genotype matrices and
t is equal to the identity matrix, then we can copy entries from
g4 into g7 with the above expression and they will interact with
the entries from g4 they were copied from (because there is no
translocation since t = I). Or, if t = I, the entries from g4 can be
copied and moved to a different position where they will interact
with entries other than the ones they were copied from (or only
identity elements).
Thus, this action can be used to represent mutations in which
genes are duplicated.
929
5. Natural selection
919
920
921
922
923
924
925
926
927
930
931
932
933
934
935
To represent natural selection, we need a gate that will dictate
whether an organism proceeds to a reproduction gate based on the
value of a certain entry in its phenotype matrix. So each natural
selection gate will be followed by a reproduction gate (or another
natural selection gate if more than one trait is under selective pressure) and will contain the following natural selection function:
S ( p) =
pk sk
k
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
[m5G;March 8, 2016;22:1]
where the selecting matrix s is a binary matrix containing a single non-zero entry (located on the main diagonal). We will denote
the position of that single non-zero entry as ss , so in other words,
although we are calculating the trace of the product ps, since ps ss
will be the only non-zero entry in the product, the output of this
function is simply k pk sk = ps .
Each gate also needs to contain a set of selection constants, {σ 1 ,
… , σ t }, and if ps = σ i for any i, then that organism is allowed to
pass through the gate. Thus the value of ps and the values of the
selection constants determine whether or not the organism proceeds to a reproduction gate.
This action can therefore represent circumstances in which an
organism in a certain environment with a certain trait reproduces
successfully whereas one with a different variant of that trait is
unsuccessful in that environment. And, since the natural selection
and reproduction actions operate on the genotype as a whole, this
formulation automatically models linked selection when there is
linkage in the entries of recombination matrices (we will see an
example of linked selection in the Section 6.1).
So, in a given population, if the fitness of a trait is not 1, then
there need to be natural selection gates in the environment space
of that population; and if the fitness of a trait is not 0, then there
need to be organisms with the selected against trait that do not
encounter natural selection gates. In other words, the frequency of
natural selection gates in the environment space of a population
correlates with the fitness of a given trait, but is not necessarily an
exact ratio since the number of organisms with and without the
selected against trait might not encounter natural selection gates
equally.
964
6. Applications
965
This formulation can be used to mathematically describe and
simulate biological phenomena and model changes in a population’s genetic composition over time (the population genetics of a
small population will be simulated in the next section). Mendelian
inheritance has a simple formulation: g1 g2 = p formulates the Law
of Dominance, where each entry in g1 , g2 is an element of Z2 ;
the breeding function g1 r1 + g2 r2 = g∗ under crossover symmetry
r1 + r2 = I, formulates the Principle of Segregation; and the Principle of Independent Assortment can be formulated with a requirement that half the spaces in a system have ra 1 + rb 2 = 1
for every a and b. Additionally, we have seen that modifications
to Mendel’s principles, like dominance hierarchies, co-dominance,
epistasis, alleles with additive effects, etc. can also be represented by a generalization of the formulas that describe Mendelian
inheritance.
A major advantage becomes apparent when this formulation is
used to model organisms under selective pressure because it can
model the effect of natural selection on the whole genotype. Natural selection not only affects the allele frequency of the genes that
give rise to the selected trait (and the genes linked to those genes),
but can also cause reductive genetic drift in the allele frequency of
all other alleles in the organism’s genotype due to the sampling
error that might arise from the reduction in the breeding population. Because, for all genes besides the ones that give rise to the
characteristic involved in the selection, the organisms that are prevented from breeding contain a random sample of the alleles from
the original population. And since the actions of this formulation
can operate individually on each organism’s genotype as a whole,
this formulation automatically simulates this phenomenon (we will
also see an example of this type of genetic drift in the next
section).
This advantage also extends to studying the effects of having more than one characteristic be under selective pressure, because selective pressure on each characteristic can cause reductive genetic drift in the frequencies of the alleles that give rise to
the other characteristics involved in selection. Yet another advantage is that the frequency of natural selection gates can be varied in each generation, so that the simulated population will be
under selective pressure in which the fitness values change over
time.
966
6.1. Population genetics
We will now work a full example involving each type of gate
(as shown in Fig 5; time moves downward in the direction of
the arrows). This example will demonstrate a simulation involving
gametic genetic drift, linked selection, and reductive genetic drift
arising from natural selection. (Because the population is so small,
this will be an extreme example of these phenomena.)
The following germ-line genotype matrices with entries from
Z210 will be used, where each “male” will contain one entry that
represents an entry from an X chromosome and one entry that
represents an entry from a Y chromosome, while “females” will
contain two entries that represent entries from an X chromosome
(all 1 s are identity elements that do not represent an allele):
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
959
960
961
962
963
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
10 0 0
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
ARTICLE IN PRESS
JID: MBS
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
[m5G;March 8, 2016;22:1]
11
Fig. 5. Environment space for this population.
Male 1
g1 = 175, 106, 20, 105, 1
g2 = 175, 175, 20, 1, 105 Female 1
g1 = 85, 36, 30, 105, 1
g2 = 85, 175, 30, 105, 1
Male 2
85, 106, 30, 175, 1
175, 36, 20, 1, 105 Female 2
105, 106, 10, 175, 1
105, 85, 20, 105, 1 Male 3
85, 85, 10, 105, 1
105, 175, 10, 1, 105
Female 3
175, 85, 20, 105, 1
175, 85, 10, 175, 1
1019
1020
1021
So the set of entries for each position in the genotype matrices
of this population is:
g=
1022
1023
1024
1025
1026
{105, 175, 85}, {36, 106, 175, 85}, {10, 20, 30},
× {105, 175}, {105}
The first gate the organisms enter is a phenotype expression
gate containing the general phenotype expression function and the
matrices μ = 1, 1, 0, 1, 1 and α = 0, 0, 1, 0, 0. Thus phenotype
expression operates in this way: g1 1 g1 2 , g2 1 g2 2 , g3 1 +g3 2 , g4 1 g4 2 ,
g5 1 g5 2 and results in the following phenotype matrices:
Male 1
p = 175, 70, 40, 105, 105
Female 1
p = 85, 0, 60, 105, 1
Male 2
175, 36, 50, 105, 105 Female 2
105, 190, 30, 105, 1
After that, all organisms except Male 1 and Female 2 enter a
natural selection gate containing s = 1, 0, 0, 0, 0 and σ = 105.
Thus, only organisms with p1 = 105 will leave the gate and proceed to a reproduction gate. So we can see that Male 2 and Female
1 will not leave the natural selection gate, since for them p1 = 175
and 85 respectively. (Note that these happened to be the only organisms that contained g2 i = 36.) However, even though Male 1 has
1035
1036
1037
1038
1039
1040
1041
Male 3
105, 175, 20, 175, 105
Female 3
105, 85, 30, 105, 1
1027
1028
1029
So the set of entries for each position in the phenotype matrices
of this population is:
p=
1030
1031
1032
1033
1034
{105, 175, 85}, {36, 175, 85, 70, 0, 190},
× {10, 20, 30, 40, 50, 60}, {105, 175}, {105}
According to Fig. 5, Male 3 next enters an alteration gate, containing the alteration function and the alteration matrices a = 1, 1,
1, 1, 105 and ã = 0, 0, 0, 0, 70 that operate on its germ-cell g2 .
This will result in Male 3’s g2 being changed to 105, 175, 10, 1,
175 since 105 × 105 + 70 = 175.
p1 = 175, it will still proceed to a reproduction gate since there was
no natural selection gate in its path.
And finally, the remaining organisms will proceed to reproduction gates containing the following recombination matrices, where
we will have r1 i = r3 i for all recombination matrices in order to
demonstrate linked selection (since p1 was the trait involved in
the selection action); (note also that for the males it must be that
r4 i = r5 i for the reasons stated in Section 2):
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
1042
1043
1044
1045
1046
1047
1048
1049
ARTICLE IN PRESS
JID: MBS
12
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
Male 1
r 1 = 0 , 1 , 0 , 1 , 1 r 2 = 1 , 0 , 1 , 0 , 0 1050
1051
1052
Male 3
1 , 1 , 1 , 0 , 0 0 , 0 , 0 , 1 , 1 1053
1055
1056
1057
1058
1060
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
g1
New Male 1
175, 85, 10, 105, 1
85, 85, 10, 1, 175
Then the original organisms will enter another reproduction
gate in which (to save space) we will just use the same recombination matrices but switch them between the organisms of each
gender (the recombination matrices previously used with Male 1
are now used with Male 3, etc.). And if we index them in the same
manner, we have two more new organisms:
New Female 2
175, 85, 10, 175, 1
105, 85, 10, 105, 1
And the set of entries for each position in the genotype matrices of this population is:
g=
1061
Female 3
0 , 0 , 0 , 1 , 0 1 , 1 , 1 , 0 , 1 If we index each
matrix created by a female as
and each
g∗ created by a male matrix as g2 when these enter the subsequent
expression gate, then we have the following new organisms:
New Male 2
g1 = 105, 85, 20, 175, 1 g2 = 175, 175, 20, 1, 105
1059
Female 2
0 , 1 , 0 , 0 , 0 1 , 0 , 1 , 1 , 1 g∗
New Female 1
g1 = 105, 106, 20, 105, 1
g2 = 175, 106, 20, 105, 1 1054
[m5G;March 8, 2016;22:1]
{105, 175, 85}, {106, 175, 85}, {10, 20},
× {105, 175}, {105, 175}
We can see that the allele frequencies of g1 = 105:175:85
changed from 0.25:0.42:0.33 in the original population to
0.375:0.5:0.125 due to the natural selection action. So the frequency of g1 = 105 increased as we would expect.
However, there were also several other changes due to the presence of the natural selection gates. This population no longer contains g2 = 36 and g3 = 30 as possible entries even though natural
selection acted on p1 . The disappearance of g2 = 36 is an example of reductive genetic drift due to natural selection since the
two organisms that were selected against happened to be the only
ones containing g2 = 36. And the disappearance of g3 = 30 is due to
linked selection, because r1 i = r3 i and all of the matrices containing g3 = 30 were matrices containing g1 = 85, which increased that
organism’s chance of having a value of p1 = 105.
Note also that there was recombination interval symmetry in
r4 + r’4 and, as expected, there was no gametic genetic drift in g4 ;
there also happened to be no reductive genetic drift since there
was no sampling error in the parental population for the allele frequency in g4 . So the original 2/3 frequency of g4 = 105 is found in
the original population, in the parents of the new generation, and
also in the new population.
But because there was a sampling error in the parental population and because there was no recombination interval symmetry in r2 + r’2 , not only were no entries of 36 passed to the
new population, but the frequency of the other entries of g2
changed twice. In the original population, the frequency ratio
of 36:106:175:85 was 0.18:0.27:0.27:0.27, while in the parents it
changed to 0:0.25:0.25:0.5 due to reductive genetic drift from natural selection and it was then changed to 0:0.25:0.125:0.625 due
to gametic genetic drift.
And we can see that this environment space ultimately produces new males and females that all have phenotypes that are
different from the organisms in the original population and New
Male 1 even has the new trait p5 = 175, which was introduced by
mutation.
New Female1
p = 105, 106, 40, 105, 1
New Male1
175, 85, 20, 105, 175
1096
New Male2
p = 105, 175, 40, 175, 105
New Female2
105, 85, 20, 105, 1
Clearly a computer program is required for larger populations
with larger sequences of gates, but even this short example shows
that this formulation provides a powerful method of studying natural selection since it acts on the genotype and phenotype as a
whole.
6.2. Epistatic ratios
The phenotypic ratios that are usually listed as arising due to
epistatic relationships between multiple loci cannot all be pro h
duced by the operation:
g .
They can however, be produced by uglier equations like:
(ge 1 ge 2 + ε )ge 3 ge 4 (where ε equals either 1 or 0). Part of the reason
this equation is unappealing is that we must require that ge 1 and
ge 2 be either 0 or 1 (and it further complicates the phenotype expression function). For example, if we have a population in which
organisms contain the entries {0, 1} at ge 1 , ge 2 and the entries{b,
c} at ge 3 , ge 4 , where b c, then using ε = 0 will of course result in
the ratio for dominant epistasis that we found earlier (because the
equation is reduced to ge 1 ge 2 ge 3 ge 4 ). And, if ε = 1, then this equation will produce the entries pe = b, pe = c, and pe = 0 in the ratio
9:3:4.
It is possible, though, that corrections need to be made to the
h
currently accepted phenotypic ratios, because the operation
g
can produce two of the ratios associated with epistasis (as we have
already seen) and one that is similar to the currently accepted
phenotypic ratio of recessive epistasis: if we have a population in
which organisms contain the entries {a, b} at ge 1 , ge 2 and the entries{c, d} at ge 3 , ge 4 , where a b and c d and a, b are extraneously prime to c, d, then the phenotype expression function will
produce the entries pe = ac, pe = bc, pe = ad, and pe = bd in the ratio
9:3:3:1.
Now, there are two genes in chickens that each have two variations that interact to express 4 different comb shapes denoted
walnut, rose, pea, and single and there are two genes in the pepper capsicum annuum that each have two variations that interact
to express the colors red, brown, yellow, and green and these phenotype variations all arise in the same manner as the phenotype
entries from above [2].
In contrast, recessive epistasis is described as producing a phenotype ratio of 9:3:4. Coat color in Labrador Retrievers is often
cited as an example of recessive epistasis because there is a B locus and an E locus that each have two variations B/b and E/e with
a dominant/recessive relationship and the alleles from these two
loci interact to express three different coat colors: the combination B_E_ produces black, bbE_ produces brown, and __ee produces
yellow [4]. Yet, the B alleles also affect skin color, making it either
black (B_) or fleshy brown (bb) [3]. So coat and skin color are not
independent traits: there cannot be Labradors with black fur and
brown skin nor Labradors with brown fur and black skin. When
we take skin color into consideration, it is readily seen that there
are four distinct traits that arise in the same manner as the 9:3:3:1
ratio from above: black fur with black skin (B_E_), brown fur with
brown skin (bbE_), yellow fur with black skin (B_ee), and yellow fur
with brown skin (bbee). This raises the question of whether phenotype ratios of 9:3:4 truly exist in nature or if some aspect of
the trait produced by these alleles is not being taken into account
when a ratio of 9:3:4 is found.
h
Likewise, the operation
g cannot produce the currently accepted phenotypic ratios for dominant and recessive epistasis, duplicate recessive epistasis, nor duplicate interaction. However, the
h
operation
g predicts several forms of epistasis besides the three
already listed.
For example, if we use the entries {a, b} at ge 1 , ge 2 and the entries{c, d} at ge 3 , ge 4 , where a c d and b is extraneously prime
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
ARTICLE IN PRESS
JID: MBS
[m5G;March 8, 2016;22:1]
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
1160
1161
1162
1163
1164
to a, c, d, then the phenotype expression function will produce the
entries pe = ab, pe = a, pe = bc, and pe = bd in the ratio 8:4:3:1.
Further research into epistasis will determine whether changes
need to be made to the currently accepted phenotypic ratios or to
the phenotype expression function.
Zn – Жn
1165
6.3. Using elements of
1166
1205
The elements of Жn can be used in the representation of relationships involving complete dominance, where the expression of
each allele by itself produces the same trait as the expression of
that allele interacting with another copy of itself, since these elements are by definition multiplicatively idempotent elements. Conversely, the elements of Zn – Жn can be used in the representation
of alleles in which the expression of that allele by itself produces
a different trait than the expression of that allele interacting with
another copy of itself, since these elements are by definition not
multiplicatively idempotent elements.
It has already been shown that a haploinsufficient relationship
can be represented using certain elements of Zn – Жn paired with
an element of Жn , but there are also other relationships that can
be produced with different pairings.
If we have u Жn and x Zn – Жn , then we can always have
ux = u, since 0 is an element of Жn ; however, there are other possibilities for u: for example, 3 × 3 ≡ 3 × 5 ≡ 3 (mod 6), while 5 × 5 ≡ 1.
Thus we can use 0 (or certain other elements of Жn ) paired with
an element of Zn – Жn to represent a relationship where the expression of A alone produces the same trait as the expression of A
with A and A with , but differs from the expression of with and of by itself (which are both different from each other).
So there are x Zn – Жn in which ux = u and ux = x (haploinsufficiency) and there is also the possibility that ux = z (where
z = u = x). For example, 7 × 7 ≡ 19 (mod 30) and 7 × 16 ≡ 22 and
16 × 16 ≡ 16. We can use numbers like these to represent a relationship where the expression of A alone produces the same trait
as the expression of A with A, but differs from the expression of A
with B, B with B, and B by itself (which are all different from each
other).
Finally, there are numbers in which xx = u but x = u. For example, 9 × 9 ≡ 21 × 21 ≡ 21 (mod 30) and 21 × 9 ≡ 9. We can use numbers like these to represent a relationship where the expression of
A alone expresses the same trait as the expression of A with A and
B with B, but differs from the expression of A with B or B alone
(which are the same).
And there are similar relationships when numbers that are both
elements of Zn – Жn are paired together. Further research will determine whether there are organisms with traits that are expressed
by relationships of these sorts.
1206
Appendix
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
13
v = α j while n = α k, then α (uj – km) = u; thus, since uj – km is an integer, α must divide u and the greatest common divisor of v and n
also divides u.
Therefore, uv ≡ u (mod n) if and only if the greatest common
divisor of v and n also divides u.
The next proof is useful for finding the elements of Жn for any
n.
Elements of Жn : If n/d is an integer relatively prime to d, then
there is an a such that an/d Жn .
1220
1221
1222
1223
1224
1225
1226
1227
1228
Proof. Suppose n/d is an integer relatively prime to d.
1229
Now, since the multiplicative modular inverse of n/d modulo d
exists if and only if n/d and d are relatively prime, this means there
is an a such that an/d ≡ 1 (mod d).
It follows then that 1 – an/d = dk, so
1230
1/d (1 − an/d ) = k.
1233
(1)
And if we multiply (1) by an, then we have (where m = ak):
an/d (1 − an/d ) = nm
1231
1232
1234
(2)
Thus an/d(an/d) ≡ an/d (mod n), which means an/d Жn .
Therefore, if n/d is an integer relatively prime to d, then there
is an a such that an/d Жn .
The next lemma shows that certain elements of Zn – Жn can
dominate elements of Жn .
1235
1236
1237
1238
1239
Divisor Lemma. For all u Жn , if gcd(u,n) = δ , then δ u ≡ δ (mod n). 1240
Proof. Suppose u Жn and gcd(u,n) = δ .
1241
The statement that gcd(u,n) = δ implies that gcd(k,h) = 1 (where
u = δ k and n = δ h ).
Now, since u Жn , uu – u = nm and dividing by δ leaves k(u –
1) = hm. And by Euclid’s Lemma it follows that m = lk, since u – 1
is an integer and gcd(k,h) = 1.
Thus, rewriting uu – u = nm as k(δ u – δ ) = nm, we have that δ u –
δ = nl.
Therefore, for all u Жn , if gcd(u,n) = δ , then δ u ≡ δ (mod n).
The alteration proposition
Before proving that the conditions aa + ãa , ãa Жn imposed on
qa aa + ãa = q’a in Section 4.1 ensure that q’a Жn , we need to first
prove a few properties about elements of Жn .
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
Addition Lemma. If u, v Жn , then u+v ≡ uu + vv (mod n).
1254
Proof. Suppose u, v Жn .
1255
By definition of being elements of Жn , uu ≡ u (mod n) and 1256
vv ≡ v, which means:
1257
uu − v ≡ u − v
(1)
u − vv ≡ u − v
(2)
1258
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
Theorem of Dominance. Given that u,v Жn , then uv ≡ u (mod n)
if and only if the greatest common divisor of v and n also divides u.
Proof. Suppose that the greatest common divisor of v and n also
divides u. In other words, suppose that u = α i, v = α j, and n = α k,
where gcd(v,n) = α , so j, k are relatively prime.
By definition of being an element of Жn , vv ≡ v (mod n), hence
nm = v(1 – v). It follows then that j(1 – v) = km, so k must divide
1 – v, since j and k are relatively prime; thus 1 – v = kh.
Now, since u = α i, it follows that u(1 – v) = α ikh, so u – uv = nih
(because n = α k).
Thus uv ≡ u (mod n).
Now suppose uv ≡ u (mod n), which means that uv – nm = u.
Consequently, if the greatest common divisor of v and n is α and
Thus:
uu – v ≡ u – vv Transitivity of (1) and (2).
Therefore if u, v Жn , then u +v ≡ uu + vv (mod n).
1259
1260
1261
Closure. If u,v Жn , then uv Жn .
1262
Proof. Suppose u,v Жn .
1263
This means that uu ≡ u (mod n) and vv ≡ v by definition. And if 1264
we multiply each expression by vv and u respectively, we have:
1265
uuvv ≡ uvv
(1)
uvv ≡ uv
(2)
1266
Thus:
uuvv ≡ uv (mod n) Transitivity of (1) and (2)
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
1267
1268
JID: MBS
14
1269
ARTICLE IN PRESS
B.E. Bahr / Mathematical Biosciences xxx (2016) xxx–xxx
Therefore, if u,v Жn , then uv Жn .
1271
Conjugate Lemma. If u Жn , then u has a conjugate such that 1 –
u Жn .
1272
Proof. Suppose u Жn .
1270
1273
1274
This means that uu ≡ u (mod n). If we add 1 – 2 u to each side
of uu ≡ u, we get
1 − 2u + uu ≡ 1 − 2u + u (mod n )
1275
And since (1 – u)(1 – u) = 1 – 2 u + uu, (∗ ) reduces to
(1 − u )(1 − u ) ≡ 1 − u (mod n ).
1279
Therefore, if u Жn , then 1 – u Жn .
With these results in hand, we can prove the Alteration Proposition (we will suppress the subscript a in qa aa + ãa = q’a to make
for easier reading):
1280
Alteration Proposition. If q, a + ã, ã Жn , then qa + ã Жn .
1281
Proof. Suppose q, a + ã, ã Жn .
1276
1277
1278
1282
1283
1284
1285
1286
[m5G;March 8, 2016;22:1]
We can begin by rewriting qa + ã as q(a + ã) + ã(1 – q).
Since q, a + ã Жn , it follows that q(a + ã) Жn (closure of Жn );
and, since ã Жn and 1 – q Жn (Conjugate Lemma), it follows
that ã(1 – q) Жn (closure of Жn ).
From the Addition Lemma, then
q(a + a˜ ) + a˜ (1 − q ) = q(a + a˜ )q(a + a˜ )
+ a˜ (1 − q )a˜ (1 − q )
(1)
Now, if we multiply [q(a + ã) + ã(1 – q)][q(a + ã) + ã(1 – q)], we 1287
find this equals:
1288
q(a + a˜ )q(a + a˜ ) + 2q(a + a˜ )a˜ (1 − q )
+ a˜ (1 − q )a˜ (1 − q )
(2)
So, since q Жn , by definition then q(1 – q) ≡ 0 (mod n), (2) is 1289
congruent to:
1290
q(a + a˜ )q(a + a˜ ) + a˜ (1 − q )a˜ (1 − q )
(3)
And since (3) is congruent to [q(a + ã) + ã(1 – q)][q(a + ã) + ã(1 – 1291
q)], then by transitivity of (1) and (3):
1292
[q(a + a˜ ) + a˜ (1 − q )][q(a + a˜ ) + a˜ (1 − q )]
= [q(a + a˜ ) + a˜ (1 − q )](mod n )
Therefore, if q, a + ã, ã Жn , then qa + ã Жn .
References
1294
[1]
[2]
[3]
[4]
1295
1296
1297
1298
1299
B. Pierce, W.H. Freeman, Genetics: A Conceptual Approach, fourth ed, 2010.
B. Pierce, W.H. Freeman, Transmission and Population Genetics, first ed, 2006.
A. Ruvinsky, J. Sampson, The Genetics of the Dog, CABI Publishing, 2001.
J. Templeton, A. Stewart, W. Fletcher, Coat color genetics in the Labrador retriever, J. Heredity 68 (1977) 134–136.
Please cite this article as: B.E. Bahr, A formulation of the foundations of genetics and evolution, Mathematical Biosciences (2016),
http://dx.doi.org/10.1016/j.mbs.2016.02.005
1293