Download portable document (.pdf) format

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
InterStat
Why Are Genetic Algorithms Markov?
¯
Khiria E. A. El-Nady
Department of Mathematics, Faculty of Science, Alexandria University, Alexandria, Egypt
pr− dr− kh− [email protected]
Usama H. M. Abou El-Enien
Department of Mathematics, Faculty of Science, Alexandria University, Alexandria, Egypt
[email protected]
Amr A. A. Badr
Department of Computer Science, Faculty of Computers and Information, Cairo University, Giza, Egypt
[email protected]
Abstract
Genetic algorithms are considered. Three algorithms are designed and executed to obtain purely empirical analysis conclusions in order to turn to purely
theoretical analysis results about the behavior of genetic algorithms as a finite
dimensional Markov and lumped Markov chains, which confirm the conjectures from these experiments and in order to introduce a complete framework
toward a new philosophy of machine intelligence. First, we model genetic algorithms using a finite dimensional Markov and lumped Markov chains. Second, we carry on a particle analysis (the basic component) and analyze the
convergence properties of these algorithms. Third, we produce two unified
Markov and lumped Markov approaches for analysis for a complete framework
and propose unique chromosomes for a purely successful optimization of these
algorithms. Furthermore, for the Markov approach, we obtain purely theoretical analysis for a classification and Stationary distributions of chains. For the
lumped Markov approach, we obtain purely theoretical analysis for all possible
conditional multivariate normal distributions of transition probability matrices and stationary multivariate normal distributions of chains.
2
Keywords: Genetic algorithms; lumped Markov chains; Classification; Central limit theorem; Stationary multivariate normal distribution; Unique chromosomes
1. Introduction
Holland ([18],[19]) proposed the use of a genetic algorithm as an efficient
search mechanism in artificially adaptive systems. Fraser [15], the following
considerable earlier development (e.g., Fraser [13], [14]), proposed a procedure quite similar to that presented in Holland ([18],[19]). The only apparent
difference between Fraser’s proposal and Holland’s is that Holland suggested
reproducing each parent in proportion to its relative fitness.
Eiben et al. [1], Fogel ([9], [11]), Horn [20], Davis and Principe [6], Rudolph
[25], and De Jong et al. [8] have studied genetic algorithms within the framework of Markov chains. Fogel [12] outlined an evolutionary programming procedure for real-valued function optimization.
The most insightful theoretical analysis of evolutionary algorithms have
been contributed by (e.g., Schwefel [26]; Bäck et al. [3])in evolution strategies. Many of these results carry over to evolutionary programming and other
stochastic optimization techniques; they have also provided a framework for
improving these procedures(Rudolph [25]). There are works that attempt to
introduce an architecture for the construction of genetic algorithms (Gedeon
et al.[16], Poli et al.[5], Mitavskiy et al.[24] and Li et al.[23]).
These works do not develop a unified stochastic model for genetic algorithms, analyze their performance and convergence properties to provide a
general framework. Thus it is significant to establish purely theoretical analysis results about the behavior of genetic algorithms as a finite dimensional
Markov and lumped Markov chains in order to introduce a complete framework.
The main purpose of this paper is to produce a unified Markov and lumped
Markov approaches for analysis in order to introduce a complete framework
of genetic algorithms toward a new philosophy of machine intelligence, and
propose unique chromosomes for a purely successful optimization of these algorithms. Through unified Markov approach, we obtain purely empirical analysis conclusions and obtain purely theoretical analysis for a classification and
Stationary distributions of chains. Through unified lumped Markov approach,
we obtain purely empirical analysis conclusions and obtain purely theoretical
analysis for all possible conditional multivariate normal distributions of transition probability matrices and stationary multivariate normal distributions of
chains.
The rest of the paper is organized as follows. In Section 2, we give the for-
3
mulation of the problem, namely: Why is genetic algorithms Markov chains?.
In Section 3, we state the main result. Then in Section 4, the proof of the main
result is given in ten steps. In Section 5, we propose the three algorithms. In
Section 6, we give four numerical examples. In Section 7, we give some concluding remarks.
2. Formulation of the problem
All empirical analysis conclusions and theoretical analysis of genetic algorithms in literature do not develop a purely unified stochastic model, analyze
their performance and convergence properties to provide a general framework.
Thus, in this paper, we consider an interesting problem, namely: Why are
genetic algorithms Markov chains?
Throughout this paper, we consider any objective real valued function of
n-variables
f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n
are domains of each variable xi and ai andbi are real numbers, the basic steps
of a standard genetic algorithm are as follows:
1.Initialization: generate a random initial population of bit strings.
2. Evaluation: evaluate the fitness of each individual of the population.
3. Selection: select, via Roulette-Wheel, the individuals that will reproduce.
4. Reproduction and genetic variation: generate the next population (offspring) by reproducing with inheritance (crossover) the selected individuals
and apply the genetic variation operator (mutation).
5. Cycle: the individuals selected in Step 3 compose the new population of
parents and are subjected to reproduction and genetic variation in Step 4.
Convergence is assumed when a pre-defined solution is met or a fixed number
of generations have been performed.
This algorithm has the following properties:1)binary encoding;2) reproduction and selection via Roulette Wheel;3)single-point crossover;and 4)singlepoint mutation.
we restrict an arbitrary uncountable set
S={ai ≤ xi ≤ bi for i = 1, 2, ..., n}
to be a subset of n-space Rn as a sample space, restrict an arbitrary countable
set T to be set of all
4
(x1 , x2 , ..., xn ) in S={ai ≤ xi ≤ bi }
for which
P (x1 , x2 , ..., xn ) > 0 as a sample space.
Let a sample space T of possible solutions to be coded as strings of
k bits {0,1}. T = {unique chromosomes}.
We restrict an arbitrary countable set U to be set of all possible simple
random samples with replacement in the ith trail
Dy(i) = (x11 , x12 , . . . , x1n ), (x21 , x22 , . . . , x2n ), . . . , (xm1 , xm2 , . . . , xmn )
for y = 1, 2, . . . , h(h = numberof samples) for which
P ((x11 , x12 , . . . , x1n ), (x21 , x22 , . . . , x2n ), . . . , (xm1 , xm2 , . . . , xmn )) > 0
as a sample space, where for each
(xj1 , xj2 , . . . , xjn ) for j = 1, 2, . . . , m(sample size)
in a common sample space T . All possible simple random samples with replacement in the ith trail can be defined by every possible configuration of an
entire population of m bit strings. There are h = 2k×m such samples, where
n number of variables
k number of bits
m is a sample size ( ≡ number of chromosomes)
h number of all simple random samples with replacement ( = 2k×m )
We restrict an arbitrary countable set U1 to be set of all possible repeated
dependent two trails without replacement
(Dr(1) ∩ Ds(2) )
for which
5
P (Dr(1) ∩Ds(2) ) ≥ 0
as a sample space, where for each
Dy(i) for y = r or s, where r = 1, 2, . . . , h and s = 1, 2, . . . , h
in a common sample space U . Each conditional probability can be defined
and can be obtained by the equation
P (Ds(2) |Dr(1) ) =
(1)
(2)
P (Dr ∩Ds )
(1)
P (Dr )
≥ 0 such that P (Dr ) 6= 0.
There are 2(2k×m ) such repeated dependent two trails without replacement.
Consider q ∈ Q where Q is the discrete parameter space of the Markov
chain process
{Xq , q = 0, 1, 2, . . .}(parameter homogeneous).
A Markov chains requires a finite collection of states, denoted by
U = {D1 , D2 , . . ., Dh },
an h × h probability matrix P called a transition matrix and a probability
vector
(0)
(0)
(0)
p(0) = ( P (D1 ), P (D2 ), . . ., P (Dh ) ) = (p1 , p2 , . . ., ph )
called the initial probability vector. We can also interpret p(0) as stationary distribution, where
h = 2k×m = number of all simple random samples with replacement
≡ number of all states = 2k×m
we Generate all possible combinations of states of unique chromosomes (=
2
) and give each state a number.
We apply one of the following algorithms:
(k×m)
(1) Genetic algorithms without bit mutation.
(2) Genetic algorithms with bit mutation.
on each generated state for n-iterations, where n is a large number. We Count
for each state with specific number the number of times it appeared, Calculate
6
the probability of each state (P ), where
it appeared
P = number of times
n
and Get the stationary distribution ordered by the state number and its probability P .
For the second approach, let
L = {L1 , L2 , . . . ,Lg }
be a partition of (all possible combinations of states of unique chromosomes =
U ). We Get Lz for each state in combination and Combine states according
their equal
Lz ’s ( (x̄1 , x̄2 , . . . , x̄n )’s)
-Taking into account that Lz ’s are unique for each set of states.
We restrict an arbitrary countable set L to be set of all possible outcomes
in the ith trail
L(i)
z = (x̄1 , x̄2 , . . . , x̄n ) for z = 1, 2, . . . , g(g = numberof statesinL)
for which
P (x̄1 , x̄2 , . . . , x̄n ) > 0
as a sample space, where
x̄k =
x1k +x2k +...+xmk
m
=
xjk
j=1 m
Pm
for k = 1, 2, . . . , n and m = sample size
and
xjk = measurement of the k th variable on the j th item.
We restrict an arbitrary countable set L1 to be set of all possible repeated
dependent two trails without replacement
(2)
(L(1)
e ∩ Ld ) for which
(2)
P (L(1)
e ∩ Ld )≥ 0
7
as a sample space, where for each
L(i)
z for z = e or d, where e = 1, 2, . . . , g and d = 1, 2, . . . , g
in a common sample space L. Each conditional probability can be defined
and can be obtained by the equation
(2)
P (Ld |L(1)
e ) =
(1)
(2)
P (Le ∩Ld )
(1)
P (Le )
≥ 0 such that P (Le ) 6= 0.
Consider q ∈ Q where Q is the discrete parameter space of Markov chain
process
{Xq , q = 0, 1, 2, . . . }(parameter homogeneous).
A lumped Markov chains requires a finite collection of states, denoted by
L = {L1 , L2 ,. . ., Lg },
an g × g probability matrix P called a transition matrix and a probability
vector
(0)
(0)
p(0) = ( P (L1 ), P (L2 ), . . ., P (Lg ) ) = (p1 , p2 , . . ., p(0)
g )
called the initial probability vector. We can also interpret p(0) as stationary
distribution.
Let Xq be an (n)-dimensional random variable defined on a sample space
L during q (initial value of time parameter), and Xq+1 be an (n)-dimensional
random variable defined on a sample space L during q + 1. The possible outcomes for Xq are
L1 , L2 , . . ., Lg ,
and the same holds for Xq+1 .
By Theorems for the mean and covariance of the sampling distribution of
X and central limit Theorem, we conclude and have the modified forms of the
Theorems for an n-dimensional random variable X defined on a sample space
L.
We get sampling multivariate normal distribution of X, get expectation of
X and get covariance matrix of X. We conclude and have that the distribution
of Xq is the same as X, and the same holds for Xq+1 .
Let (Xq , Xq+1 )0 be an 2n-dimensional random variable defined on a sample space L1 and be distributed as multivariate normal. Then the conditional
8
distribution of
Xq+1 , given that Xq = L? = any state of L,
is multivariate normal and has
Mean = µ?
and
Covariance = Σ? .
We select one state from U , get L? for the selected state and replicate the
selected state 2(k×m) times.
We apply one of the following algorithms:
(1) Genetic algorithms without bit mutation.
(2) Genetic algorithms with bit mutation.
for one transition (iteration) only on the replicated states to produce new
states. On the generated new states, we compute Lz . On the set of Lz ’s computed, we compute conditional normal distribution
N (µ? , Σ? ).
By substitution, we Compute non diagonal covariance of the covariance
matrix of the distribution of
(Xq , Xq+1 )0
in Mean = µ? . By a similar argument, we compute non diagonal covariance
for all states of L, check that non diagonal covariance is the same and check
that Σ? is the same.
3. Main result
In this section, we shall state the main theorem.
Theorem 3.1. For any genetic algorithm with and without bit mutation, the
following holds:
(1. Markov approach) The sequence of (m × n)-dimensional random matrices
9
X 0 , X1 , X2 , . . .
converges in distribution to the (m × n)-dimensional random matrix X(has
unique stationary distribution) if and only if for each (a1 , a2 , . . . , a(m×n) )
φxi (a1 , a2 , . . . , a(m×n) )→φx (a1 , a2 , . . . , a(m×n) ) as i → ∞
(uniqueness of characteristic function), where
n = number of variables
m = number of sets of measurements on n variables = sample size = number
of chromosomes
φx (a1 , a2 , . . . , a(m×n) ) is the characteristic function of X of m × n real variables.
(a) If P is a transition matrix for any genetic algorithm with (absorbing chain)
or without (regular chain) bit mutation and the probability vector t is a fixed
point of the matrix P, then
P2(k×m) P2(k×m)
j=1
i=1
P2(k×m)
ti Pijn → (
j=1
tj = 1) as n → ∞.
(b) For any genetic algorithm with or without bit mutation , a real valued
function
f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n
is one that contains an infinite number of Markov chains(every chain have
different Unique chromosomes for purely successful optimization and have different globally optimum value(s)).
(2. lumped Markov approach) The sequence of n-dimensional random vectors
X 0 , X 1 , X2 , . . .
converges in distribution to the n-dimensional random vector X(has unique
stationary multivariate normal distribution) if and only if for each (a1 , a2 , . . . , an )
φxi (a1 , a2 , . . . , an )→φx (a1 , a2 , . . . , an ) as i → ∞
(uniqueness of characteristic function), where
10
n = number of variables.
φx (a1 , a2 , . . . , an ) is the characteristic function of X of n real variables.
(a) All possible conditional multivariate normal distributions of transition
probability matrix have the following form: (see proof of the main theorem)
The conditional distribution of Xq+1 , given that Xq = L? , is multivariate
normal and has
Mean = µ? = µ + Σt1 t2 Σ−1 (L? − µ)
and
Covariance = Σ? = Σ − Σt1 t2 Σ−1 Σt2 t1 .
The distribution of X is stationary multivariate normal distribution.
(b) For any genetic algorithm with or without bit mutation , a real valued
function
f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n
is one that contains an infinite number of lumped Markov chains(every chain,
have different Unique chromosomes for purely successful optimization and have
different globally optimum value(s)).
4. Proof of the main result
In this section, we prove the main result in Theorem 3.1. We start with a
useful theorem.
Theorem 4.1.Let (S, β, P ) be a probability space and let T denote the set of
all x in S for which P (x) > 0. Then T is countable.
We shall prove Theorem 3.1 in ten steps.
Proof of Theorem 3.1. Step 1. For genetic algorithms with and without
bit mutation , we define a probability space
(S, β, P ).
For genetic algorithms with and without bit mutation, let a real valued
11
function
f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n.
we restrict an arbitrary uncountable set
S= {a1 ≤ x1 ≤ b1 }
to be a subset of the real line
R, or of n-space Rn
(S={ai ≤ xi ≤ bi for i = 1, 2, ..., n})
as a sample space, we shall assume that this set is a Borel set. The Borel subsets of S themselves form a Boolean σ-algebra β. A nonnegative completely
additive set function P defined on β with P (S)=1 is called a probability measure. We have a probability space
(S, β, P ).
Step 2. We prove that T is a countable subset of S, and define n-dimensional
random variable defined on a sample space T .
Proof By Theorem 4.1, we restrict an arbitrary countable set T to be set of all
x1 in S={a1 ≤ x1 ≤ b1 } for which
P (x1 )> 0 or of all
(x1 , x2 , ..., xn ) in S={ai ≤ xi ≤ bi for i = 1, 2, ..., n} for which
P (x1 , x2 , ..., xn ) > 0
as a sample space. T denotes a countable subset of S(whenever we use a
set T of real numbers as a sample space, or, more generally, whenever we use
a set T in n-space as a sample space, we shall assume that this set is a Borel
set).
Let a sample space T of possible solutions to be coded as strings of k bits
{0,1} and let each possible configuration have a fitness
fi , i = 1, .., 2k ,
12
let f ? be the globally optimum value. Hence T is countable.
Let T be a set in
n-space for some n ≥ 1
(a Borel set in n-space for some n ≥ 1) and if τ consists of all subsets of
T , the probability function P is completely determined on τ . We have a probability space
(T, τ, P ).
Next, we define an n-dimensional random variable defined on a sample
space T .
Let X1 be a one- dimensional random variable defined on a sample space T .
X1 (x1 ) = x1 for each x1 in T
or let X be an n-dimensional random variable defined on a sample space T .
X = (X1 (x1 ) = x1 ,X2 (x2 ) = x2 , ..., Xn (xn ) = xn )0 = (x1 , x2 , ..., xn )
for each
(x1 , x2 , ..., xn ) in T .
We will use notation xik to indicate the particular value of the k th variable that is observed on the j th item, or trail. That is, xjk = measurement
(commonly called data) of the k th variable on the j th item consequently, m
measurements on n variables can be displayed as a rectangular array. We need
to make assumptions about the variables whose observed values constitute the
data set.
Step 3. We prove that U is a countable set of all possible simple random
samples with replacement in the ith trail, and define an (m × n)-dimensional
random variable defined on a sample space U .
Proof. Let the jk th entry in the data matrix be the random variables Xjk .Each
set of measurements
Xj on n variables
is a random vector, and we have the random matrix.
13

X=














X11 X12
X21 X22
·
·
·
·
Xj1 Xj2
·
·
·
·
Xm1 Xm2
...
...
X1k
X2k
·
·
. . . Xjk
·
·
. . . Xmk
...
...
X1n
X2n
·
·
. . . Xjn
·
·
. . . Xmn






























=
X01
X02
·
·
X0j
·
·
X0m















A random sample can now be defined.
X1 , X2 , . . .,Xj , . . ., Xm
form a random sample if their joint probability mass function is given by
the product
P (X1 )P (X2 ) . . . P (Xj ) . . . P (Xm ),
where
P (Xj ) = P (xj1 , xj2 , . . . , xjk , . . . , xjn )
is the probability mass function for the j th row vector.
{Joint probability mass function of X1 , X2 , . . ., Xj , . . ., Xm }
= P (X) = P (X1 , X2 , . . ., Xj , . . ., Xm )
= P (X1 )P (X2 ) . . . P (Xj ) . . . P (Xm ).
We restrict an arbitrary countable set U to be set of all possible simple
random samples with replacement in the ith trail
Dy(i) for y = 1, 2, . . . , h for which
P ((x11 , x12 , . . . , x1n ), (x21 , x22 , . . . , x2n ), . . . , (xm1 , xm2 , . . . , xmn )) > 0
as a sample space, where for each
14
(xj1 , xj2 , . . . , xjn ) for j = 1, 2, . . . , m
in a common sample space T .
All possible simple random samples with replacement in the ith trail can be
defined by every possible configuration of an entire population of m bit strings.
There are h = 2k×m such samples. Hence U is countable.
Let U be a set in (m × n)-space for some
(m × n) ≥ (2 × 1) = 2
(a Borel set in (m × n)-space for some (m × n)≥ 2) and if ν consists of all
subsets of U , the probability function P is completely determined on ν. We
have a probability space
(U, ν, P ).
Next, we define an (m×n)-dimensional random variable defined on a sample
space U .
Let X be an (m × n)-dimensional random variable defined on a sample
space U .
X = (X1 , X2 , . . ., Xm )
= ( (X11 (x11 )=x11 , X12 (x12 )=x12 , . . . , X1n (x1n )=x1n ),
(X21 (x21 )=x21 , X22 (x22 )=x22 , . . . , X2n (x2n )=x2n ),
. . . , (Xm1 (xm1 )=xm1 , Xm2 (xm2 )=xm2 , . . . , Xmn (xmn )=xmn ) )
= ( (x11 , x12 , . . . , x1n ),(x21 , x22 , . . . , x2n ), . . . ,(xm1 , xm2 , . . . , xmn ) )
for each
( (x11 , x12 , . . . , x1n ), (x21 , x22 , . . . , x2n ), . . . , (xm1 , xm2 , . . . , xmn ) )
in U , where
Xj = (xj1 , xj2 , . . . , xjn ) for j = 1, 2, . . . , m
for each
(xj1 , xj2 , . . . , xjn ) in T
15
(Xj are n-dimensional random variables defined on a common sample space
T ).
Step 4. We prove that U1 is a countable set of all possible repeated dependent two trails without replacement, define an 2(m × n)-dimensional random
variable defined on a sample space U1 , and describe Markov chain process.
Proof. We restrict an arbitrary countable set U1 to be set of all possible
repeated dependent two trails without replacement
(Dr(1) ∩ Ds(2) ) for which
P (Dr(1) ∩Ds(2) ) ≥ 0
as a sample space, where for each
Dy(i) for y = r or s, where r = 1, 2, . . . , h and s = 1, 2, . . . , h
in a common sample space U . Each conditional probability can be defined
and can be obtained by the equation
P (Ds(2) |Dr(1) ) =
(1)
(2)
P (Dr ∩Ds )
(1)
P (Dr )
≥ 0 such that P (Dr ) 6= 0.
There are 2(2k×m ) such repeated dependent two trails without replacement.
Hence U1 is countable.
Let U1 be a set in
2(m × n)-space for some 2(m × n) ≥ 2(2×1) = 4
(a Borel set in 2(m × n)-space for some 2(m × n) ≥ 4) and if ν1 consists
of all subsets of U1 , the probability function P is completely determined on ν1 .
We have a probability space
(U1 , ν1 , P ).
We define an 2(m × n)-dimensional random variable defined on a sample
space U1 .
Consider q ∈ Q where Q is the discrete parameter space of the Markov
chain process
{Xq , q = 0, 1, 2, . . .}(parameter homogeneous).
Let
16
Xq be an (m × n)-dimensional random variable
defined on a sample space U during q (initial value of time parameter), and
Xq+1 be an (m × n)-dimensional random variable
defined on a sample space U during q + 1. The possible outcomes for Xq
are
D1 , D2 , . . ., Dh ,
and the same holds for Xq+1 .
Let X be an 2(m × n)-dimensional random variable defined on a sample
space U1 .
X = (Xq , Xq+1 ) = (Xq = Dr , Xq+1 = DS ) = (Dr , Ds ) = (Xb1 , Xb2 )
=((Xb1 1 ,Xb1 2 ,. . .,Xb1 m ), (Xb2 1 ,Xb2 2 ,. . .,Xb2 m ))
for each (Dr , Ds ) in U1 , where Xv = Dy = (Xb1 , Xb2 , . . ., Xbm ) = Xb
=((X(b1)1 (x(b1)1 )=x(b1)1 , X(b1)2 (x(b1)2 )=x(b1)2 ,. . . ,X(b1)n (x(b1)n )=x(b1)n ),
(X(b2)1 (x(b2)1 )=x(b2)1 , X(b2)2 (x(b2)2 )=x(b2)2 ,. . .,X(b2)n (x(b2)n )=x(b2)n ),
. . .,(X(bm)1 (x(bm)1 )=x(bm)1 , X(bm)2 (x(bm)2 )=x(bm)2 ,. . . ,X(bm)n (x(bm)n )=x(bm)n ))
=((x(b1)1 ,x(b1)2 ,. . .,x(b1)n ), (x(b2)1 ,x(b2)2 ,. . .,x(b2)n ),. . .,(x(bm)1 ,x(bm)2 ,. . .,x(bm)n ))
for v = q or q + 1, y = r or s and b = b1 or b2 for each
((x(b1)1 ,x(b1)2 ,. . .,x(b1)n ), (x(b2)1 ,x(b2)2 ,. . .,x(b2)n ),. . ., (x(bm)1 ,x(bm)2 ,. . .,x(bm)n ))
in U
(Xv are (m × n)-dimensional random variables defined on a common sample space U ).
Next, we describe Markov chain process.
A Markov chains requires a finite collection of states, denoted by
17
U = {D1 , D2 , . . ., Dh },
an h × h probability matrix P called a transition matrix and a probability
vector
(0)
(0)
(0)
p(0) = ( P (D1 ), P (D2 ), . . ., P (Dh ) ) = (p1 , p2 , . . ., ph )
called the initial probability vector. We can also interpret p(0) as stationary
distribution.
Step 5. We prove that genetic algorithms without bit mutation are absorbing chains.
Proof. From Step 2, We get unique chromosomes (= 2k ). From Step 3 and
Step 4, we Generate all possible combinations of states of unique chromosomes
(= 2(k×m) ) and give each state a number.
We apply genetic algorithms without bit mutation on each generated state
for n-iterations, where n is a large number. We count for each state with
specific number the number of times it appeared, Calculate the probability of
each state (P ), where
it appeared
P = number of times
n
and Get the stationary distribution ordered by the state number and its probability P . A state in this chain will be absorbing if all the members of the
population are identical. Otherwise, a state is transient(see[10]).
Now we recall a theorem in [21], conclude and have the following modified
form of the theorem.
Every Markov chain with a single unit set has a unique probability vector
fixed point. This vector has positive component for any absorbing state (= 1),
and zero for the transient states.
We conclude that all stationary distributions (= 2(k×m) ) have positive component for any absorbing state (= 1), and zero for the transient states. Number
of all absorbing state ( = 2k ). If P is a transition matrix for genetic without
bit mutation and the probability vector t is a fixed point of the matrix P, then
P2(k×m) P2(k×m)
j=1
i=1
P2(k×m)
ti Pijn → (
j=1
tj = 1) as n → ∞.
Hence genetic algorithms without bit mutation are absorbing Markov chains.
For any genetic algorithm without bit mutation, a real valued function
f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n
18
is one that contains an infinite number of Markov chains(every chain have
different globally optimum value(s) and have Unique chromosomes for purely
successful optimization).
Step 6. We prove that genetic algorithms with bit mutation are regular
chains.
Proof. From Step 2, We get unique chromosomes (= 2k ). From Step 3 and
Step 4, we Generate all possible combinations of states of unique chromosomes
(= 2(k×m) ) and give each state a number.
We apply genetic algorithms with bit mutation on each generated state for
n-iterations, where n is a large number. We Count for each state with specific
number the number of times it appeared, Calculate the probability of each
state (P ), where
it appeared
P = number of times
n
and Get the stationary distribution ordered by the state number and its probability P . All states in this chain will be ergodic.
Now we recall a theorem in [21].
Theorem 4.2. If P is a transition matrix for ergodic chain, then:
(1) There is a unique probability vector fixed point t : t = tP .
(2) All components of t are positive.
(n)
(3) If hj is the average number of times the process is in state j in the first n
steps, then for any > 0,
(n)
P (| hj − tj |> ) → 0 as n → ∞
no matter what the starting state is.
We conclude and have that all stationary distributions
(= 2(k×m) )
are the same unique probability vector fixed point. If P is a transition matrix
for genetic algorithms with bit mutation and the probability vector t is a fixed
point of the matrix P, then
P2(k×m) P2(k×m)
j=1
i=1
P2(k×m)
ti Pijn → (
j=1
tj = 1) as n → ∞.
19
Hence genetic algorithms with bit mutation are ergodic chains.
We take each generated state and its chain, get the position numbers sequences for each unique state ( of 2k×m ) in the chain, starting with index zero
in the initial state and check that all sequences of positions are even positions,
odd positions, or random positions sequences. Hence genetic algorithms with
bit mutation are regular chains.
For any genetic algorithm with bit mutation, a real valued function
f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n
is one that contains an infinite number of Markov chains(every chain have
different globally optimum value(s) and have Unique chromosomes for purely
successful optimization).
Step 7. We describe the partition L of all possible combinations of states
of unique chromosomes, prove that L is a countable set and define an ndimensional random variable defined on a sample space L.
Let
L = {L1 , L2 , . . . ,Lg }
be a partition of all possible combinations of states of unique chromosomes.
We get Lz for each state in combination and Combine states according their
equal
Lz ’s ( (x̄1 , x̄2 , . . . , x̄n )’s )
-Taking into account that Lz ’s are unique for each set of states. We prove
that L is a countable set.
We restrict an arbitrary countable set L to be set of all possible outcomes
in the ith trail
L(i)
z = (x̄1 , x̄2 , . . . , x̄n ) for z = 1, 2, . . . , g
for which
P (x̄1 , x̄2 , . . . , x̄n ) > 0
as a sample space, where
x̄k =
and
x1k +x2k +...+xmk
m
=
xjk
j=1 m
Pm
for k = 1, 2, . . . , n
20
xjk = measurement of the k th variable on the j th item.
Hence L is a countable.
Let L be a set in
n-space for some n ≥ 1
and if ζ consists of all subsets of L, the probability function P is completely
determined on ζ. We have a probability space
(L, ζ, P ).
Next, we define an n-dimensional random variable defined on a sample
space L.
Let X be an n-dimensional random variable defined on a sample space L.
For each (x̄1 , x̄2 , . . . , x̄n ) in L,
X = (X 1 (x̄1 ) = x̄1 , X 2 (x̄2 ) = x̄2 , . . . , X n (x¯n ) = x̄n )
= (x̄1 , x̄2 , . . . , x̄n ).
Step 8. We prove that L1 is set of all possible repeated dependent two
trails without replacement, define an 2n-dimensional random variable defined
on a sample space L1 and describe lumped Markov chains process.
Proof. We restrict an arbitrary countable set L1 to be set of all possible
repeated dependent two trails without replacement
(2)
(L(1)
e ∩ Ld ) for which
(2)
P (L(1)
e ∩ Ld )≥ 0
as a sample space, where for each
L(i)
z for z = e or d, where e = 1, 2, . . . , g and d = 1, 2, . . . , g
in a common sample space L. Each conditional probability can be defined
and can be obtained by the equation
(2)
P (Ld |L(1)
e ) =
(1)
(2)
P (Le ∩Ld )
(1)
P (Le )
≥ 0 such that P (Le ) 6= 0.
21
Hence L1 is countable.
Let L1 be a set in
2n-space for some 2n ≥ 2(1) = 2
(a Borel set in 2n-space for some 2n ≥ 2) and if ζ1 consists of all subsets
of L1 , the probability function P is completely determined on ζ1 . We have a
probability space
(L1 , ζ1 , P ).
We define an 2n-dimensional random variable defined on a sample space
L1 .
Consider q ∈ Q where Q is the discrete parameter space of the Markov
chain process
{Xq , q = 0, 1, 2, . . . }(parameter homogeneous).
Let
Xq be an (n)-dimensional random variable
defined on a sample space L during q (initial value of time parameter), and
Xq+1 be an (n)-dimensional random variable
defined on a sample space L during q + 1. The possible outcomes for Xq
are
L1 , L2 , . . ., Lg ,
and the same holds for Xq+1 .
Let X be an 2n-dimensional random variable defined on a sample space L1 .
X = (Xq , Xq+1 ) = (Xq = Le , Xq+1 = Ld ) = (Le , Ld )
= (Xt1 , Xt2 )
= ((X t1 1 , X t1 2 , . . ., X t1 n ),(X t2 1 , X t2 2 , . . ., X t2 n ))
for each (Le , Ld ) in L1 , where Xv = Lz = Xt
22
= (X t1 (x̄t1 ) = x̄t1 , X t2 (x̄t2 ) = x̄t2 , . . . , X tn (x̄tn ) = x̄tn )
= (x̄t1 , x̄t2 , . . . , x̄tn )
for v = q or q + 1, z = e or d and t = t1 or t2 for each
(x̄t1 , x̄t2 , . . . , x̄tn ) in L.
(Xv are n-dimensional random variables defined on a common sample space
L).
Next, we describe the lumped Markov chains process.
A Markov chains requires a finite collection of states, denoted by
L = {L1 , L2 ,. . ., Lg },
an g × g probability matrix P called a transition matrix and a probability
vector
(0)
(0)
p(0) = ( P (L1 ), P (L2 ), . . ., P (Lg ) ) = (p1 , p2 , . . ., p(0)
g )
called the initial probability vector. We can also interpret p(0) as stationary
distribution.
Step 9. By Theorems for the mean and covariance of the sampling distribution of X and central limit Theorem, we conclude and have the following
modified forms of the Theorems for an n-dimensional random variable X defined on a sample space L.
For the mean and covariance Theorems of the sampling distribution of X
and central limit Theorem (see [20]).
The modified form of the mean theorem If
D1 , D2 , . . ., Dh
represent all possible simple random samples with replacement from a common
joint probability mass function
P (x1 , x2 , ..., xn )
(the parent population, whatever its form, have a mean
µ = (µ1 , µ2 , ..., µn )0
23
and a finite covariance Σ1 ), then an n-dimensional density for the random
vector
X = (X 1 , X 2 , ..., X n )0 (sampling distribution of X) r(X)
is centered around the population mean, regardless of sample size (E(X) = µ).
The modified form of the covariance theorem. Each covariance
σik , i, k = 1, 2, . . . , n
of the distribution of X decreases with increasing sample size; that is, the
distribution of X becomes more concentrated around the population mean as
the sample size gets larger (a covariance of X is denoted by
cov(X) =
Σ1
,
m
where m
(number of chromosomes) is a sample size).
For practical problem, often
m = 100 = chromosomes and bit = k > 50;
thus there may be more than 25000 possible states in the chain.
The distribution of X becomes more symmetrical as the sample size gets larger
and is approximately
Nn (µ, Σ =
Σ1
)
m
for large sample size. An n-dimensional normal density for the random vector
X has the form
f (x) =
1
n
1
(2Π) 2 |Σ| 2
e
x−µ)0 Σ−1 (x−µ)
−(
2
where −∞ < x̄i < ∞, i = 1, 2, . . . , n.
The modified form of central limit Theorem.Let
X1 , X2 , . . ., Xm
be independent observations from any population with mean µ and finite covariance Σ1 . Then X has an approximate
Nn (µ, Σm1 )
24
distribution for large sample sizes. Here m should also be large relative to
n. The approximation provided by the central limit theorem applies to discrete, as well as continuous, multivariate populations.
Theorems for the mean and covariance of the sampling distribution of X
and central limit Theorem apply to sampling of finite populations if the sampling fraction is 5 percent or smaller (the sample size m is small relative to the
population size M ; that is, with fraction small).
Step 10. We prove that genetic algorithms with and without bit mutation
are lumped Markov chains.
Proof. From Step 2, we get unique chromosomes (= 2k ). From Step 3 and Step
4, we Generate all possible combinations of states of unique chromosomes(=
2(k×m) ) and give each state a number.
From Step 7, we get Lz for each state in combination and Combine states
according to their equal Lz ’s - Taking into account that Lz ’s are unique for
each set of states. Let each set of states to be a separate new state. From
Step 7 and Step 8, we identify new state space of Lz ’s based on set of states
of unique Lz ’s, define
L = {L1 , L2 , . . . , Lg }
and define a random variable X on state space L.
From Step 9, we get sampling distribution of
X (Nn (µ, Σm1 = Σ)),
get expectation of sampling distribution
E(X) = µ,
where µ is the expectation of parent population and get covariance matrix
of
X (= Σ =
Σ1
),
m
where m = sample size and Σ1 = covariance matrix of parent population.
We conclude and have that the distribution of
Xq is Nn (µ, Σ),
25
and the same holds for Xq+1 .
Let
X = (Xq , Xq+1 )0
be distributed as
N2n (µ? , Σ? )
with
µ? = (µ = µt1 ,µ = µt2 )0 ,
?
Σ =
Σt1 t1 = Σ
Σt1 t2
Σt2 t1
Σt2 t2 = Σ
!
,
and
| Σt2 t2 | > 0.
Then the conditional distribution of Xq+1 , given that Xq = L? , is normal
and has
Mean = µ? = µ + Σt1 t2 Σ−1 (L? − µ)
and
Covariance = Σ? = Σ − Σt1 t2 Σ−1 Σt2 t1 .
We select one state from U , get L? for the selected state and replicate the
selected state 2(k×m) times.
We apply one of the following algorithms:
(1) Genetic algorithms without bit mutation.
(2) Genetic algorithms with bit mutation.
for one transition (iteration) only on the replicated states to produce new
states.
On the generated new states, we compute Lz . On the set of Lz ’s computed,
we compute conditional normal distribution
N (µ? , Σ? ).
26
We Compute Σt1 t2 , where
µ? = µ + Σt1 t2 Σ−1 (L? − µ).
By a similar argument, we compute
Σt1 t2 for all states of L, check that
Σt1 t2 is the same and check that
Σ? is the same.
Hence genetic algorithms with and without bit mutation are lumped Markov
chains.
We compute all possible conditional normal distributions of transition matrix and can also interpret the distribution of X as stationary normal distribution.
On the basis of Steps 1-10, we complete the proof of Theorem 3.1.
5. Proposed algorithms
We prepared programs by using MATLAB 7.5. Afterwards, three algorithms are designed and executed to observe the characteristics and to know
the behavior of the following:
5.1. Genetic algorithms (GAs) without bit mutation We named the proposed algorithm absorbing optimization analysis (AOA), the basic steps of the
AOA algorithm are as follows:
1. Input number of bits k.
2. Get unique chromosomes = 2k .
3. Input number of chromosomes m.
4. Get number of states = 2(k×m) .
5. Generate all possible combinations of states of unique chromosomes ( =
also 2(k×m) ).
6. Give each state a number.
7. Apply GAs without bit mutation on each generated state for n-iterations,
where n is a large number.
8. count for each state with specific number the number of times it appeared.
9. Calculate the probability of each state (P ), where
it appeared .
P = number of times
n
27
10. Get the stationary distribution ordered by the state number and its probability P .
5.2. Genetic algorithms (GAs) with bit mutation We named the proposed algorithm regular optimization analysis (ROA), the basic steps of the
ROA algorithm are as follows:
1. Input number of bits k.
2. Get unique chromosomes = 2k .
3. Input number of chromosomes m.
4. Get number of states = 2(k×m) .
5. Generate all possible combinations of states of unique chromosomes (= also
2(k×m) ).
6. Give each state a number.
7. Pick one state randomly.
8. Apply GAs with bit mutation on the state for n-iterations, where n is a
large number.
9. Count for each state with specific number the number of times it appeared.
10. Calculate the probability of each state (P ), where
it appeared .
P = number of times
n
11. Get the stationary distribution ordered by the state number and its probability P .
12. Taking the randomly chosen state and its chain. Get the position numbers
sequences for each unique state ( of 2k×m ) in the chain, starting with index
zero in the initial state.
Check if all sequences of positions are either
(A) Even positions or
(B) Odd positions only then the chain is cyclic else if also random positions
sequences appear then the chain is regular.
5.3. Genetic algorithms with and without bit mutation (as lumped
Markov chains) We named the proposed algorithm Central-Markovian optimization analysis(CMOA),the basic steps of the CMOA algorithm are as
follows:
1. Input number of bits k.
2. Get unique chromosomes = 2k .
3. Input number of chromosomes m.
28
4. Get number of states = 2(k×m) .
5. Get all possible combinations of states of unique chromosomes = U (= also
2(k×m) ).
6. Get Lz for each state in combination.
7. Combine states according to their equal Lz ’s - Taking into account that
Lz ’s are unique for each set of states.
8. Let each set of states to be a separate new state.
9. Identify new state space of Lz ’s based on set of states of unique Lz ’s.
10. Define L = {L1 , L2 , . . . , Lg }.
11. Define a random variable X on state space L.
12. Get sampling distribution of X (Nn (µ, Σm1 = Σ)).
13. Get expectation of sampling distribution E(X) = µ, where µ is the expectation of parent population.
14. Get covariance matrix of X (= Σ = Σm1 ), where m = sample size and Σ1
= covariance matrix of parent population.
15. Let(Xq , Xq+1 )0 be distributed as (N2n (µ? , Σ? )) with
µ? = (µ = µt1 ,µ = µt2 )0 ,
?
Σ =
Σt1 t1 = Σ
Σt1 t2
Σt2 t1
Σt2 t2 = Σ
!
, and | Σt2 t2 |> 0,
where q is the initial value of time parameter Q of the Markov chain process
{Xq , q = 0, 1, 2, . . .}
that the possible outcomes for Xq are L1 , L2 , . . . , Lg and the same holds for
Xq+1 .
16. Get E(Xq ) = µ.
17. Get covariance matrix of Xq = Σ.
18. Get E(Xq+1 ) = µ.
19. Get covariance matrix of Xq+1 = Σ.
20. Group states on U such that
(1)
(2)
(3)
(4)
states are globally optimal (all members of states are identical).
All or Some members of states have globally optimum values.
states are not globally optimal and all members of states are identical.
All or Some members of states have not globally optimum values.
21. Select one state randomly from each group on U .
22. Get L? for the selected states.
23. Replicate each selected state (from each group) 2(k×m) times.
29
24. Apply one of the following algorithms
(1) GAs without bit mutation ((2) and (4) only from (20)).
(2) GAs with bit mutation.
for one transition (iteration) only on each replicated selected state to produce new states.
25. On the generated new states, compute Lz .
26. On the set of Lz ’s computed, compute conditional normal distribution
N (µ? , Σ? ).
27. Compute Σt1 t2 , where µ? = µ + Σt1 t2 Σ−1 (L? − µ).
28. check that Σ? is the same for all four groups.
29. If Σt1 t2 for the four groups of states are equal then the process is Markov
chain.
30. Substitute all possible Lz ’s to compute all conditional expected values µ? .
As a result this computes all possible conditional normal distributions and The
distribution of X is stationary multivariate normal distribution.
6. Numerical results and discussion
6.1. A numerical example for GAs without bit mutation
Consider the following:
f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2]
k = 5 bits, m = 2 chromosomes, n = 100 iterations, probability of crossover
= 0.6, probability of mutation = 0.9
Unique chromosomes = {00000 = -1.000000, 00001 = -0.903226, 00010 =
-0.806452, 00011 = -0.709677, 00100 = -0.612903, 00101 = -0.516129, 00110 =
-0.419355, 00111 = -0.322581, 01000 = -0.225806, 01001 = -0.129032, 01010
= -0.032258, 01011 = 0.064516, 01100 = 0.161290, 01101 = 0.258065, 01110
= 0.354839, 01111 = 0.451613, 10000 = 0.548387, 10001 = 0.645161, 10010 =
0.741935, 10011 = 0.838710, 10100 = 0.935484, 10101 = 1.032258, 10110 =
1.129032, 10111 = 1.225806, 11000 = 1.322581, 11001 = 1.419355, 11010 =
1.516129, 11011 = 1.612903, 11100 = 1.709677, 11101 = 1.806452, 11110 =
1.903226, 11111 = 2.000000}
Globally optimum value = 1.225806
30
All possible combinations of states of unique chromosomes =
{ 0:(00000, 00000),
1:(00000, 00001),
. . . , 1023:(11111, 11111) }
n = 100 iterations and all stationary distributions ordered by the state
number
0, 0, 0, 0, 0, 0, 0, . . . ,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
Stationary distribution (0) = (1)
1, 3, 3, 3, 3, 3,3, 3, 3, 3, . . . , 3, 3, 3, 3, 3, 3, 3, 3, 3
Stationary distribution (1, 3) = (0,1)
2, 2, 0, 0, 0, 0, 0, 0, 0, 0, . . . , 0, 0, 0, 0, 0, 0, 0, 0, 0
Stationary distribution (0, 2) = (1, 0)
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, . . . , 3, 3, 3, 3, 3, 3, 3, 3, 3
Stationary distribution (3) = (1)
..
.
1021, 1022, 1023, 1023, 1023, . . . ,1023, 1023, 1023, 1023, 1023
Stationary distribution (1021, 1022, 1023) = (0,0, 1)
1022,1021, 1020, 1020, 1020, . . . ,1020, 1020, 1020, 1020, 1020
Stationary distribution (1020, 1021, 1022) = (1, 0, 0)
1023, 1023, 1023, 1023, 1023, . . . ,1023, 1023, 1023, 1023, 1023
Stationary distribution (1023) = (1)
We conclude that all stationary distributions (= 2(k×m) ) have positive com-
31
ponent for any absorbing state (= 1), and zero for the transient states. Number
of all absorbing state ( = 2k ).
6.2. A numerical example for GAs with bit mutation
Consider the following:
f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2]
k = 5, m = 2, n = 100000 iterations, probability of crossover = 0.6, probability of mutation = 0.9
Number of iteration
0
1
2
..
.
Number of state
884
141
1010
..
.
99997
99998
99999
495
48
975
Table(1) n = 100000
Ordered number of state Probability
0
0.029850
1
0.004520
2
0.004210
..
..
.
.
1021
0.004730
1022
0.004050
1023
0.031120
Table(2)stationary distribution of Table(1)
Classification of chain (regular)
884 to 141(neither even nor odd)
1-2204-4267-5988-7086-7214-7495-8761-10647-11173-11231-11972-12666-13648..
.
-49413-49480-50639-50834-55098-56852-57766-58204-58373-59907-60346-6053060898-63871-64176-64215-65166-70131-71202-71764-72192-73921-77079-7813778804-78824-81185-82354-84321-84569-85216-87825-90839-90853-92645-9688897918-99262
32
..
.
884 to 681(even)
71370-90980
884 to 350(odd)
73663
884 to 617(neither even nor odd)
75678-98597
884 to 676(odd)
76713
884 to 361(odd)
91709
884 to 358(odd)
92055
884 to 667(neither even nor odd)
96602-98115
Consider the following:
f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2]
k = 5, m = 2, n = 150000 iterations, probability of crossover = 0.6, probability of mutation = 0.9
Number of iteration
0
1
2
..
.
Number of state
393
252
803
..
.
149997
149998
149999
64
831
192
Table(3) n = 150000
Ordered number of state Probability
0
0.029433
1
0.004113
2
0.004500
..
..
.
.
1021
0.004520
1022
0.004413
1023
0.030880
33
Table(4) stationary distribution of Table(3)
Classification of chain (regular)
393 to 252(neither even nor odd)
1-147-215-327-342-401-403-405-541-777-793-795-852-1108-1357-1420-1422-14241426-1489-2436-2531..
.
-124475-124479-124506-124717-124927-124978-125103-125364-125366-125368-125620125864-125866-125868-126277-126317-126438-127057-127339-127441-127769-127838128318-128320-128322-128328-128520-128773-129057-129687-130131-130535-130537130539-130705-130707-130935-130939-131010-131475-131484-131488-131833-132027132057-132160-132508-132510-132512-132689-132724-132728-132736-132843-132845132996-133117-133241-133258-133415-133419-133445-133586-133706-133836-133838133840-134232-134436-134630-134682-134931-134991-135087-135148-135150-135339135542-135666-135777-135979-136149-136191-136572-136788-136790-136794-136915137009-137011-137103-137154-137188-137954-138052-138122-138366-138852-138952139177-139623-139717-139852-140277-140285-140713-140742-141403-142003-142280142282-142356-142447-142932-142955-143073-143075-143320-143322-144006-144010144677-145183-145210-145497-145706-145778-145782-145786-145814-145920-146031146033-146093-146095-146365-146494-146986-147008-147252-147326-147548-147627147976-147990-148034-148036-148344-148581-148799-148801-148880-149109-149160149344-149661-149976
..
.
393 to 358(neither even nor odd)
63841-108342
393 to 681(even)
65872-67640
393 to 661(odd)
73529
393 to 618(neither even nor odd)
81349-124315-140962
393 to 666(even)
104718
From Tables 2 and 4, we obtain the same unique stationary distribution
and obtain the regular chain.
6.3. A numerical example for GAs without bit mutation (as lumped
Markov chains)
Consider the following:
34
f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2]
k = 5, m = 2, probability of crossover = 0.6, probability of mutation = 0.9
Unique chromosomes = {00000 = -1.000000, 00001 = -0.903226, 00010 =
-0.806452, 00011 = -0.709677, 00100 = -0.612903, 00101 = -0.516129, 00110 =
-0.419355, 00111 = -0.322581, 01000 = -0.225806, 01001 = -0.129032, 01010
= -0.032258, 01011 = 0.064516, 01100 = 0.161290, 01101 = 0.258065, 01110
= 0.354839, 01111 = 0.451613, 10000 = 0.548387, 10001 = 0.645161, 10010 =
0.741935, 10011 = 0.838710, 10100 = 0.935484, 10101 = 1.032258, 10110 =
1.129032, 10111 = 1.225806, 11000 = 1.322581, 11001 = 1.419355, 11010 =
1.516129, 11011 = 1.612903, 11100 = 1.709677, 11101 = 1.806452, 11110 =
1.903226, 11111 = 2.000000}
Globally optimum value = 1.225806
We will use the notation x̄x̄value = Value of x̄.
All possible combinations of states of unique chromosomes
= { {(00000, 00000)} = -1.000000 = x̄−1.000000 ,
{(00000, 00001), (00001, 00000)} = -0.951613 = x̄−0.951613 ,
{ (00001, 00001), (00000, 00010), (00010, 00000)} = -0.903226 = x̄−0.903226 ,
. . . , {(11111, 11111)} = 2.000000 = x̄2.000000 }
Sampling distribution of X (N1 (0.500000, 0.159355))≡ stationary distribution.
Covariance = 0.160546 for the two groups of states are equal then the process is lumped Markov chain
Conditional variance = 0.000022 is the same for the two groups.
All possible conditional normal distributions of transition matrix
{x̄−1.000000 :N1 (-1.011202,0.000022),
x̄−0.951613 :N1 (-0.962453,0.000022),
x̄−0.903226 :N1 (-0.913705,0.000022), . . .,
35
x̄2.000000 :N1 (2.011202,0.000022)}
6.4. A numerical example for GAs with bit mutation (as lumped
Markov chains)
Consider the following:
f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2]
k = 5, m = 2, probability of crossover = 0.6, probability of mutation = 0.9
All possible combinations of states of unique chromosomes
= { {(00000, 00000)} = -1.000000 = x̄−1.000000 ,
{ (00000, 00001), (00001, 00000)} = -0.951613 = x̄−0.951613 ,
{ (00001, 00001), (00000, 00010), (00010, 00000)} = -0.903226 = x̄−0.903226 ,
. . ., { (11111, 11111) }= 2.000000 = x̄2.000000 }
Sampling distribution of X (N1 (0.500000, 0.159355))≡ stationary distribution.
Covariance = -0.141386 for the four groups of states are equal then the
process is lumped Markov chain
Conditional variance = 0.007336 is the same for all four groups
All possible conditional normal distributions of transition matrix
{x̄−1.000000 :N1 (1.830859,0.007336),
x̄−0.951613 :N1 (1.787928,0.007336),
x̄−0.903226 :N1 (1.744997,0.007336), . . .,
x̄2.000000 :N1 (-0.830859,0.007336)}
For genetic algorithms with and without bit mutation the stationary distributions are the same (see Table(5) of a bivariate probability distribution for
36
two discrete random variables Xq , Xq+1 and see proof of the main Theorem,
where P (Dr ) 6= 0 ∀r).
A bivariate probability distribution for two discrete random variables is
illustrated in Table(5).
Xq \ Xq+1
D1
D2
·
D1
P (D1 ∩D1 )
P (D2 ∩D1 )
·
D2
P (D1 ∩D2 )
P (D2 ∩D2 )
·
·
·
Dh
Total
·
·
P (Dh ∩D1 )
P (D1 )
·
·
P (Dh ∩D2 )
P (D2 )
...
...
...
Dh
P (D1 ∩Dh )
P (D2 ∩Dh )
·
Total
P (D1 )
P (D2 )
·
·
·
. . . P (Dh ∩Dh )
...
P (Dh )
·
·
P (Dh )
1
...
Table (5)
The two marginal probability distributions are found in the margins of
Table(5)-that for Xq in the rightmost column and that for Xq+1 in the bottom row. The marginal probabilities are obtained by summing over rows and
columns.
7. Discussion
In this paper, the main result is the unified theorem of genetic algorithms
toward a new philosophy of machine intelligence. The unified theorem will
help us to propose unique chromosomes for a purely successful optimization of
these algorithms. Through unified Markov approach, we obtain purely empirical analysis conclusions and obtain purely theoretical analysis for a classification and Stationary distributions of chains. Through unified lumped Markov
approach, we obtain purely empirical analysis conclusions and obtain purely
theoretical analysis for all possible conditional multivariate normal distributions of transition probability matrices and stationary multivariate normal
distributions of chains.
References
[1] A.E. Eiben, E.H.L. Aarts, K.M. Van Hee, Global Convergence of Genetic
Algorithms: A Markov Chain Analysis, In: H.-P. Schwefel, R. Männer(EDs.),
Parallel Problem Solving from Nature, Berlin: Springer, (1991), pp. 4-12.
37
[2] U.H. Abou El-Enien, Graph Drawing algorithms for Markov Chains, M.Sc.
diss., Egypt, ISSR, Cairo University, (2006).
[3] T. Bäck, F. Hoffmeister, H.-P. Schwefel,A Survey of Evolution Strategies,
In Proc. of the Fourth Intern. Conf. on Genetic Algorithms, edited by R. K.
Belew and L. B. Booker. San Mateo, CA: Morgan Kaufman, (1991) 2-9.
[4] U.N.Bhat, Elements of Applied Stochastic Processes, John Wiley and Sons,
New York, 1972.
[5] R. Poli, W. B. Langdon, M. Clerc, C. R. Stephens, Continuous Optimisation Theory Made Easy? Finite-Element Models of Evolutionary Strategies,
Genetic Algorithms and Particle Swarm Optimizers, Foundations of Genetic
Algorithms, 9th International Workshop, edited by Christopher R. Stephens,
M. Toussaint, L. Darrell Whitley, P. F. Stadler, Mexico City, Mexico, Lecture
Notes in Computer Science vol. 4436 Springer, (2007)165-193.
[6] T.E. Davis, J.C. Principe, A Markov Chain Framework for the simple Genetic Algorithm, Evolutionary Computation, Vol. 1:3, (1993) 269-288.
[7] K.A. de Jong, Genetic Algorithms: A 25 Year Perspective, Computational
intelligence: Imitating Life, IEEE Press,(1994) 125-134.
[8] K. A. De Jong, William M. Spears, D. F. Gordon, Using Markov Chains
to Analyze GAFOs, Proceedings of the Third Workshop on Foundations of
Genetic Algorithms, edited by L. Darrell Whitley and Michael D. Vose, Estes
Park, Colorado, USA, Morgan Kaufmann ,(1995)115-137.
[9] D.B. Fogel, Asymptotic Convergence Properties of Genetic Algorithms and
Evolutionary Programming: Analysis and Experiments, Cybernetics and Systems, Vol. 25:3, (1994)389-407.
[10] D.B. Fogel, Evolutionary Computation : Towards a New Philosophy of
Machine Intelligence, IEEE Press, New York, 1995.
[11] D.B. Fogel, Evolving Artificial Intelligence, Ph.D. diss., UCSD, (1992).
[12] D. B. Fogel, System Identification through Simulated Evolution: A Machine Learning Approach to Modeling. Needham, MA: Ginn Press(1991).
[13] A.S. Fraser, Comments on Mathematical Challenges to the Neo-Darwinian
Concept of Evolution, edited by P.S. Moorhead, M.M. Kaplan, Philadelphia:
Wistar Institute Press,(1967) p. 107.
[14] A.S. Fraser, Simulation of Genetic Systems by Automatic Digital Computers, I. Introduction, Australian J. of Biol. Sci., Vol. 10, (1957)484-491.
[15] A.S. Fraser, The Evolution of Purposive Behavior, In Purposive Systems,
edited by H. von Foerster, J. D. White, L. J. Peterson, and J. K. Russell,
Washington DC: Spartan Books, ,1968, pp. 15-23.
[16] T. Gedeon, C. Hayes, R. Swanson, Genericity of the Fixed Point Set for
the Infinite Population Genetic Algorithm, Foundations of Genetic Algorithms,
9th International Workshop, edited by Christopher R. Stephens, M. Toussaint,
L. Darrell Whitley, P. F. Stadler, Lecture Notes in Computer Science vol. 4436
Springer ,Mexico City, Mexico, (2007) 97-109.
38
[17] M. Goldman, Introduction to Probability and Statistics, Harcourt, Brace
and World, New York, 1970.
[18] J.H. Holland, Adaptive Plans Optimal for Payoff-Only Environments, In
Proc. of the Second Hawaii Inter. Conf. on System Sciences, (1969) 917-920.
[19] J. Horn, Finite Markov Chain Analysis of Genetic Algorithms with Niching, Proceedings of the 5th International Conference on Genetic Algorithms,edited
by S. Forrest, Urbana-Champaign, IL, USA, Morgan Kaufmann, (1993)110117.
[20] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis,
fourth ed., Prentice Hall, New Jersey, 1998.
[21] J.G.Kemeny, J.L.Snell, Finite Markov Chains, Springer-Verlag, New York,
1976.
[22] X. Li, W. Luo, X. Yao, Theoretical foundations of evolutionary computation,Genetic Programming and Evolvable Machines ,Springer Netherlands,
Volume 9, Number 2, (2008)107-108.
[23] B. Mitavskiy, J.E. Rowe, A. H. Wright, L. M. Schmitt, Quotients of
Markov chains and asymptotic properties of the stationary distribution of
the Markov chain associated to an evolutionary algorithm, Genetic Programming and Evolvable Machines ,Springer Netherlands, Volume 9, Number 2,
(2008)109-123.
[24] G. Rudolph, Convergence Analysis of Canonical Genetic Algorithms, IEEE
Trans. on Neural Networks, Vol. 5:1, (1994) 96-101.
[25] H.-P. Schwefel, Numerical Optimization of Computer Models, Chichester,
UK: John Wiley(1981).