Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
InterStat Why Are Genetic Algorithms Markov? ¯ Khiria E. A. El-Nady Department of Mathematics, Faculty of Science, Alexandria University, Alexandria, Egypt pr− dr− kh− [email protected] Usama H. M. Abou El-Enien Department of Mathematics, Faculty of Science, Alexandria University, Alexandria, Egypt [email protected] Amr A. A. Badr Department of Computer Science, Faculty of Computers and Information, Cairo University, Giza, Egypt [email protected] Abstract Genetic algorithms are considered. Three algorithms are designed and executed to obtain purely empirical analysis conclusions in order to turn to purely theoretical analysis results about the behavior of genetic algorithms as a finite dimensional Markov and lumped Markov chains, which confirm the conjectures from these experiments and in order to introduce a complete framework toward a new philosophy of machine intelligence. First, we model genetic algorithms using a finite dimensional Markov and lumped Markov chains. Second, we carry on a particle analysis (the basic component) and analyze the convergence properties of these algorithms. Third, we produce two unified Markov and lumped Markov approaches for analysis for a complete framework and propose unique chromosomes for a purely successful optimization of these algorithms. Furthermore, for the Markov approach, we obtain purely theoretical analysis for a classification and Stationary distributions of chains. For the lumped Markov approach, we obtain purely theoretical analysis for all possible conditional multivariate normal distributions of transition probability matrices and stationary multivariate normal distributions of chains. 2 Keywords: Genetic algorithms; lumped Markov chains; Classification; Central limit theorem; Stationary multivariate normal distribution; Unique chromosomes 1. Introduction Holland ([18],[19]) proposed the use of a genetic algorithm as an efficient search mechanism in artificially adaptive systems. Fraser [15], the following considerable earlier development (e.g., Fraser [13], [14]), proposed a procedure quite similar to that presented in Holland ([18],[19]). The only apparent difference between Fraser’s proposal and Holland’s is that Holland suggested reproducing each parent in proportion to its relative fitness. Eiben et al. [1], Fogel ([9], [11]), Horn [20], Davis and Principe [6], Rudolph [25], and De Jong et al. [8] have studied genetic algorithms within the framework of Markov chains. Fogel [12] outlined an evolutionary programming procedure for real-valued function optimization. The most insightful theoretical analysis of evolutionary algorithms have been contributed by (e.g., Schwefel [26]; Bäck et al. [3])in evolution strategies. Many of these results carry over to evolutionary programming and other stochastic optimization techniques; they have also provided a framework for improving these procedures(Rudolph [25]). There are works that attempt to introduce an architecture for the construction of genetic algorithms (Gedeon et al.[16], Poli et al.[5], Mitavskiy et al.[24] and Li et al.[23]). These works do not develop a unified stochastic model for genetic algorithms, analyze their performance and convergence properties to provide a general framework. Thus it is significant to establish purely theoretical analysis results about the behavior of genetic algorithms as a finite dimensional Markov and lumped Markov chains in order to introduce a complete framework. The main purpose of this paper is to produce a unified Markov and lumped Markov approaches for analysis in order to introduce a complete framework of genetic algorithms toward a new philosophy of machine intelligence, and propose unique chromosomes for a purely successful optimization of these algorithms. Through unified Markov approach, we obtain purely empirical analysis conclusions and obtain purely theoretical analysis for a classification and Stationary distributions of chains. Through unified lumped Markov approach, we obtain purely empirical analysis conclusions and obtain purely theoretical analysis for all possible conditional multivariate normal distributions of transition probability matrices and stationary multivariate normal distributions of chains. The rest of the paper is organized as follows. In Section 2, we give the for- 3 mulation of the problem, namely: Why is genetic algorithms Markov chains?. In Section 3, we state the main result. Then in Section 4, the proof of the main result is given in ten steps. In Section 5, we propose the three algorithms. In Section 6, we give four numerical examples. In Section 7, we give some concluding remarks. 2. Formulation of the problem All empirical analysis conclusions and theoretical analysis of genetic algorithms in literature do not develop a purely unified stochastic model, analyze their performance and convergence properties to provide a general framework. Thus, in this paper, we consider an interesting problem, namely: Why are genetic algorithms Markov chains? Throughout this paper, we consider any objective real valued function of n-variables f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n are domains of each variable xi and ai andbi are real numbers, the basic steps of a standard genetic algorithm are as follows: 1.Initialization: generate a random initial population of bit strings. 2. Evaluation: evaluate the fitness of each individual of the population. 3. Selection: select, via Roulette-Wheel, the individuals that will reproduce. 4. Reproduction and genetic variation: generate the next population (offspring) by reproducing with inheritance (crossover) the selected individuals and apply the genetic variation operator (mutation). 5. Cycle: the individuals selected in Step 3 compose the new population of parents and are subjected to reproduction and genetic variation in Step 4. Convergence is assumed when a pre-defined solution is met or a fixed number of generations have been performed. This algorithm has the following properties:1)binary encoding;2) reproduction and selection via Roulette Wheel;3)single-point crossover;and 4)singlepoint mutation. we restrict an arbitrary uncountable set S={ai ≤ xi ≤ bi for i = 1, 2, ..., n} to be a subset of n-space Rn as a sample space, restrict an arbitrary countable set T to be set of all 4 (x1 , x2 , ..., xn ) in S={ai ≤ xi ≤ bi } for which P (x1 , x2 , ..., xn ) > 0 as a sample space. Let a sample space T of possible solutions to be coded as strings of k bits {0,1}. T = {unique chromosomes}. We restrict an arbitrary countable set U to be set of all possible simple random samples with replacement in the ith trail Dy(i) = (x11 , x12 , . . . , x1n ), (x21 , x22 , . . . , x2n ), . . . , (xm1 , xm2 , . . . , xmn ) for y = 1, 2, . . . , h(h = numberof samples) for which P ((x11 , x12 , . . . , x1n ), (x21 , x22 , . . . , x2n ), . . . , (xm1 , xm2 , . . . , xmn )) > 0 as a sample space, where for each (xj1 , xj2 , . . . , xjn ) for j = 1, 2, . . . , m(sample size) in a common sample space T . All possible simple random samples with replacement in the ith trail can be defined by every possible configuration of an entire population of m bit strings. There are h = 2k×m such samples, where n number of variables k number of bits m is a sample size ( ≡ number of chromosomes) h number of all simple random samples with replacement ( = 2k×m ) We restrict an arbitrary countable set U1 to be set of all possible repeated dependent two trails without replacement (Dr(1) ∩ Ds(2) ) for which 5 P (Dr(1) ∩Ds(2) ) ≥ 0 as a sample space, where for each Dy(i) for y = r or s, where r = 1, 2, . . . , h and s = 1, 2, . . . , h in a common sample space U . Each conditional probability can be defined and can be obtained by the equation P (Ds(2) |Dr(1) ) = (1) (2) P (Dr ∩Ds ) (1) P (Dr ) ≥ 0 such that P (Dr ) 6= 0. There are 2(2k×m ) such repeated dependent two trails without replacement. Consider q ∈ Q where Q is the discrete parameter space of the Markov chain process {Xq , q = 0, 1, 2, . . .}(parameter homogeneous). A Markov chains requires a finite collection of states, denoted by U = {D1 , D2 , . . ., Dh }, an h × h probability matrix P called a transition matrix and a probability vector (0) (0) (0) p(0) = ( P (D1 ), P (D2 ), . . ., P (Dh ) ) = (p1 , p2 , . . ., ph ) called the initial probability vector. We can also interpret p(0) as stationary distribution, where h = 2k×m = number of all simple random samples with replacement ≡ number of all states = 2k×m we Generate all possible combinations of states of unique chromosomes (= 2 ) and give each state a number. We apply one of the following algorithms: (k×m) (1) Genetic algorithms without bit mutation. (2) Genetic algorithms with bit mutation. on each generated state for n-iterations, where n is a large number. We Count for each state with specific number the number of times it appeared, Calculate 6 the probability of each state (P ), where it appeared P = number of times n and Get the stationary distribution ordered by the state number and its probability P . For the second approach, let L = {L1 , L2 , . . . ,Lg } be a partition of (all possible combinations of states of unique chromosomes = U ). We Get Lz for each state in combination and Combine states according their equal Lz ’s ( (x̄1 , x̄2 , . . . , x̄n )’s) -Taking into account that Lz ’s are unique for each set of states. We restrict an arbitrary countable set L to be set of all possible outcomes in the ith trail L(i) z = (x̄1 , x̄2 , . . . , x̄n ) for z = 1, 2, . . . , g(g = numberof statesinL) for which P (x̄1 , x̄2 , . . . , x̄n ) > 0 as a sample space, where x̄k = x1k +x2k +...+xmk m = xjk j=1 m Pm for k = 1, 2, . . . , n and m = sample size and xjk = measurement of the k th variable on the j th item. We restrict an arbitrary countable set L1 to be set of all possible repeated dependent two trails without replacement (2) (L(1) e ∩ Ld ) for which (2) P (L(1) e ∩ Ld )≥ 0 7 as a sample space, where for each L(i) z for z = e or d, where e = 1, 2, . . . , g and d = 1, 2, . . . , g in a common sample space L. Each conditional probability can be defined and can be obtained by the equation (2) P (Ld |L(1) e ) = (1) (2) P (Le ∩Ld ) (1) P (Le ) ≥ 0 such that P (Le ) 6= 0. Consider q ∈ Q where Q is the discrete parameter space of Markov chain process {Xq , q = 0, 1, 2, . . . }(parameter homogeneous). A lumped Markov chains requires a finite collection of states, denoted by L = {L1 , L2 ,. . ., Lg }, an g × g probability matrix P called a transition matrix and a probability vector (0) (0) p(0) = ( P (L1 ), P (L2 ), . . ., P (Lg ) ) = (p1 , p2 , . . ., p(0) g ) called the initial probability vector. We can also interpret p(0) as stationary distribution. Let Xq be an (n)-dimensional random variable defined on a sample space L during q (initial value of time parameter), and Xq+1 be an (n)-dimensional random variable defined on a sample space L during q + 1. The possible outcomes for Xq are L1 , L2 , . . ., Lg , and the same holds for Xq+1 . By Theorems for the mean and covariance of the sampling distribution of X and central limit Theorem, we conclude and have the modified forms of the Theorems for an n-dimensional random variable X defined on a sample space L. We get sampling multivariate normal distribution of X, get expectation of X and get covariance matrix of X. We conclude and have that the distribution of Xq is the same as X, and the same holds for Xq+1 . Let (Xq , Xq+1 )0 be an 2n-dimensional random variable defined on a sample space L1 and be distributed as multivariate normal. Then the conditional 8 distribution of Xq+1 , given that Xq = L? = any state of L, is multivariate normal and has Mean = µ? and Covariance = Σ? . We select one state from U , get L? for the selected state and replicate the selected state 2(k×m) times. We apply one of the following algorithms: (1) Genetic algorithms without bit mutation. (2) Genetic algorithms with bit mutation. for one transition (iteration) only on the replicated states to produce new states. On the generated new states, we compute Lz . On the set of Lz ’s computed, we compute conditional normal distribution N (µ? , Σ? ). By substitution, we Compute non diagonal covariance of the covariance matrix of the distribution of (Xq , Xq+1 )0 in Mean = µ? . By a similar argument, we compute non diagonal covariance for all states of L, check that non diagonal covariance is the same and check that Σ? is the same. 3. Main result In this section, we shall state the main theorem. Theorem 3.1. For any genetic algorithm with and without bit mutation, the following holds: (1. Markov approach) The sequence of (m × n)-dimensional random matrices 9 X 0 , X1 , X2 , . . . converges in distribution to the (m × n)-dimensional random matrix X(has unique stationary distribution) if and only if for each (a1 , a2 , . . . , a(m×n) ) φxi (a1 , a2 , . . . , a(m×n) )→φx (a1 , a2 , . . . , a(m×n) ) as i → ∞ (uniqueness of characteristic function), where n = number of variables m = number of sets of measurements on n variables = sample size = number of chromosomes φx (a1 , a2 , . . . , a(m×n) ) is the characteristic function of X of m × n real variables. (a) If P is a transition matrix for any genetic algorithm with (absorbing chain) or without (regular chain) bit mutation and the probability vector t is a fixed point of the matrix P, then P2(k×m) P2(k×m) j=1 i=1 P2(k×m) ti Pijn → ( j=1 tj = 1) as n → ∞. (b) For any genetic algorithm with or without bit mutation , a real valued function f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n is one that contains an infinite number of Markov chains(every chain have different Unique chromosomes for purely successful optimization and have different globally optimum value(s)). (2. lumped Markov approach) The sequence of n-dimensional random vectors X 0 , X 1 , X2 , . . . converges in distribution to the n-dimensional random vector X(has unique stationary multivariate normal distribution) if and only if for each (a1 , a2 , . . . , an ) φxi (a1 , a2 , . . . , an )→φx (a1 , a2 , . . . , an ) as i → ∞ (uniqueness of characteristic function), where 10 n = number of variables. φx (a1 , a2 , . . . , an ) is the characteristic function of X of n real variables. (a) All possible conditional multivariate normal distributions of transition probability matrix have the following form: (see proof of the main theorem) The conditional distribution of Xq+1 , given that Xq = L? , is multivariate normal and has Mean = µ? = µ + Σt1 t2 Σ−1 (L? − µ) and Covariance = Σ? = Σ − Σt1 t2 Σ−1 Σt2 t1 . The distribution of X is stationary multivariate normal distribution. (b) For any genetic algorithm with or without bit mutation , a real valued function f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n is one that contains an infinite number of lumped Markov chains(every chain, have different Unique chromosomes for purely successful optimization and have different globally optimum value(s)). 4. Proof of the main result In this section, we prove the main result in Theorem 3.1. We start with a useful theorem. Theorem 4.1.Let (S, β, P ) be a probability space and let T denote the set of all x in S for which P (x) > 0. Then T is countable. We shall prove Theorem 3.1 in ten steps. Proof of Theorem 3.1. Step 1. For genetic algorithms with and without bit mutation , we define a probability space (S, β, P ). For genetic algorithms with and without bit mutation, let a real valued 11 function f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n. we restrict an arbitrary uncountable set S= {a1 ≤ x1 ≤ b1 } to be a subset of the real line R, or of n-space Rn (S={ai ≤ xi ≤ bi for i = 1, 2, ..., n}) as a sample space, we shall assume that this set is a Borel set. The Borel subsets of S themselves form a Boolean σ-algebra β. A nonnegative completely additive set function P defined on β with P (S)=1 is called a probability measure. We have a probability space (S, β, P ). Step 2. We prove that T is a countable subset of S, and define n-dimensional random variable defined on a sample space T . Proof By Theorem 4.1, we restrict an arbitrary countable set T to be set of all x1 in S={a1 ≤ x1 ≤ b1 } for which P (x1 )> 0 or of all (x1 , x2 , ..., xn ) in S={ai ≤ xi ≤ bi for i = 1, 2, ..., n} for which P (x1 , x2 , ..., xn ) > 0 as a sample space. T denotes a countable subset of S(whenever we use a set T of real numbers as a sample space, or, more generally, whenever we use a set T in n-space as a sample space, we shall assume that this set is a Borel set). Let a sample space T of possible solutions to be coded as strings of k bits {0,1} and let each possible configuration have a fitness fi , i = 1, .., 2k , 12 let f ? be the globally optimum value. Hence T is countable. Let T be a set in n-space for some n ≥ 1 (a Borel set in n-space for some n ≥ 1) and if τ consists of all subsets of T , the probability function P is completely determined on τ . We have a probability space (T, τ, P ). Next, we define an n-dimensional random variable defined on a sample space T . Let X1 be a one- dimensional random variable defined on a sample space T . X1 (x1 ) = x1 for each x1 in T or let X be an n-dimensional random variable defined on a sample space T . X = (X1 (x1 ) = x1 ,X2 (x2 ) = x2 , ..., Xn (xn ) = xn )0 = (x1 , x2 , ..., xn ) for each (x1 , x2 , ..., xn ) in T . We will use notation xik to indicate the particular value of the k th variable that is observed on the j th item, or trail. That is, xjk = measurement (commonly called data) of the k th variable on the j th item consequently, m measurements on n variables can be displayed as a rectangular array. We need to make assumptions about the variables whose observed values constitute the data set. Step 3. We prove that U is a countable set of all possible simple random samples with replacement in the ith trail, and define an (m × n)-dimensional random variable defined on a sample space U . Proof. Let the jk th entry in the data matrix be the random variables Xjk .Each set of measurements Xj on n variables is a random vector, and we have the random matrix. 13 X= X11 X12 X21 X22 · · · · Xj1 Xj2 · · · · Xm1 Xm2 ... ... X1k X2k · · . . . Xjk · · . . . Xmk ... ... X1n X2n · · . . . Xjn · · . . . Xmn = X01 X02 · · X0j · · X0m A random sample can now be defined. X1 , X2 , . . .,Xj , . . ., Xm form a random sample if their joint probability mass function is given by the product P (X1 )P (X2 ) . . . P (Xj ) . . . P (Xm ), where P (Xj ) = P (xj1 , xj2 , . . . , xjk , . . . , xjn ) is the probability mass function for the j th row vector. {Joint probability mass function of X1 , X2 , . . ., Xj , . . ., Xm } = P (X) = P (X1 , X2 , . . ., Xj , . . ., Xm ) = P (X1 )P (X2 ) . . . P (Xj ) . . . P (Xm ). We restrict an arbitrary countable set U to be set of all possible simple random samples with replacement in the ith trail Dy(i) for y = 1, 2, . . . , h for which P ((x11 , x12 , . . . , x1n ), (x21 , x22 , . . . , x2n ), . . . , (xm1 , xm2 , . . . , xmn )) > 0 as a sample space, where for each 14 (xj1 , xj2 , . . . , xjn ) for j = 1, 2, . . . , m in a common sample space T . All possible simple random samples with replacement in the ith trail can be defined by every possible configuration of an entire population of m bit strings. There are h = 2k×m such samples. Hence U is countable. Let U be a set in (m × n)-space for some (m × n) ≥ (2 × 1) = 2 (a Borel set in (m × n)-space for some (m × n)≥ 2) and if ν consists of all subsets of U , the probability function P is completely determined on ν. We have a probability space (U, ν, P ). Next, we define an (m×n)-dimensional random variable defined on a sample space U . Let X be an (m × n)-dimensional random variable defined on a sample space U . X = (X1 , X2 , . . ., Xm ) = ( (X11 (x11 )=x11 , X12 (x12 )=x12 , . . . , X1n (x1n )=x1n ), (X21 (x21 )=x21 , X22 (x22 )=x22 , . . . , X2n (x2n )=x2n ), . . . , (Xm1 (xm1 )=xm1 , Xm2 (xm2 )=xm2 , . . . , Xmn (xmn )=xmn ) ) = ( (x11 , x12 , . . . , x1n ),(x21 , x22 , . . . , x2n ), . . . ,(xm1 , xm2 , . . . , xmn ) ) for each ( (x11 , x12 , . . . , x1n ), (x21 , x22 , . . . , x2n ), . . . , (xm1 , xm2 , . . . , xmn ) ) in U , where Xj = (xj1 , xj2 , . . . , xjn ) for j = 1, 2, . . . , m for each (xj1 , xj2 , . . . , xjn ) in T 15 (Xj are n-dimensional random variables defined on a common sample space T ). Step 4. We prove that U1 is a countable set of all possible repeated dependent two trails without replacement, define an 2(m × n)-dimensional random variable defined on a sample space U1 , and describe Markov chain process. Proof. We restrict an arbitrary countable set U1 to be set of all possible repeated dependent two trails without replacement (Dr(1) ∩ Ds(2) ) for which P (Dr(1) ∩Ds(2) ) ≥ 0 as a sample space, where for each Dy(i) for y = r or s, where r = 1, 2, . . . , h and s = 1, 2, . . . , h in a common sample space U . Each conditional probability can be defined and can be obtained by the equation P (Ds(2) |Dr(1) ) = (1) (2) P (Dr ∩Ds ) (1) P (Dr ) ≥ 0 such that P (Dr ) 6= 0. There are 2(2k×m ) such repeated dependent two trails without replacement. Hence U1 is countable. Let U1 be a set in 2(m × n)-space for some 2(m × n) ≥ 2(2×1) = 4 (a Borel set in 2(m × n)-space for some 2(m × n) ≥ 4) and if ν1 consists of all subsets of U1 , the probability function P is completely determined on ν1 . We have a probability space (U1 , ν1 , P ). We define an 2(m × n)-dimensional random variable defined on a sample space U1 . Consider q ∈ Q where Q is the discrete parameter space of the Markov chain process {Xq , q = 0, 1, 2, . . .}(parameter homogeneous). Let 16 Xq be an (m × n)-dimensional random variable defined on a sample space U during q (initial value of time parameter), and Xq+1 be an (m × n)-dimensional random variable defined on a sample space U during q + 1. The possible outcomes for Xq are D1 , D2 , . . ., Dh , and the same holds for Xq+1 . Let X be an 2(m × n)-dimensional random variable defined on a sample space U1 . X = (Xq , Xq+1 ) = (Xq = Dr , Xq+1 = DS ) = (Dr , Ds ) = (Xb1 , Xb2 ) =((Xb1 1 ,Xb1 2 ,. . .,Xb1 m ), (Xb2 1 ,Xb2 2 ,. . .,Xb2 m )) for each (Dr , Ds ) in U1 , where Xv = Dy = (Xb1 , Xb2 , . . ., Xbm ) = Xb =((X(b1)1 (x(b1)1 )=x(b1)1 , X(b1)2 (x(b1)2 )=x(b1)2 ,. . . ,X(b1)n (x(b1)n )=x(b1)n ), (X(b2)1 (x(b2)1 )=x(b2)1 , X(b2)2 (x(b2)2 )=x(b2)2 ,. . .,X(b2)n (x(b2)n )=x(b2)n ), . . .,(X(bm)1 (x(bm)1 )=x(bm)1 , X(bm)2 (x(bm)2 )=x(bm)2 ,. . . ,X(bm)n (x(bm)n )=x(bm)n )) =((x(b1)1 ,x(b1)2 ,. . .,x(b1)n ), (x(b2)1 ,x(b2)2 ,. . .,x(b2)n ),. . .,(x(bm)1 ,x(bm)2 ,. . .,x(bm)n )) for v = q or q + 1, y = r or s and b = b1 or b2 for each ((x(b1)1 ,x(b1)2 ,. . .,x(b1)n ), (x(b2)1 ,x(b2)2 ,. . .,x(b2)n ),. . ., (x(bm)1 ,x(bm)2 ,. . .,x(bm)n )) in U (Xv are (m × n)-dimensional random variables defined on a common sample space U ). Next, we describe Markov chain process. A Markov chains requires a finite collection of states, denoted by 17 U = {D1 , D2 , . . ., Dh }, an h × h probability matrix P called a transition matrix and a probability vector (0) (0) (0) p(0) = ( P (D1 ), P (D2 ), . . ., P (Dh ) ) = (p1 , p2 , . . ., ph ) called the initial probability vector. We can also interpret p(0) as stationary distribution. Step 5. We prove that genetic algorithms without bit mutation are absorbing chains. Proof. From Step 2, We get unique chromosomes (= 2k ). From Step 3 and Step 4, we Generate all possible combinations of states of unique chromosomes (= 2(k×m) ) and give each state a number. We apply genetic algorithms without bit mutation on each generated state for n-iterations, where n is a large number. We count for each state with specific number the number of times it appeared, Calculate the probability of each state (P ), where it appeared P = number of times n and Get the stationary distribution ordered by the state number and its probability P . A state in this chain will be absorbing if all the members of the population are identical. Otherwise, a state is transient(see[10]). Now we recall a theorem in [21], conclude and have the following modified form of the theorem. Every Markov chain with a single unit set has a unique probability vector fixed point. This vector has positive component for any absorbing state (= 1), and zero for the transient states. We conclude that all stationary distributions (= 2(k×m) ) have positive component for any absorbing state (= 1), and zero for the transient states. Number of all absorbing state ( = 2k ). If P is a transition matrix for genetic without bit mutation and the probability vector t is a fixed point of the matrix P, then P2(k×m) P2(k×m) j=1 i=1 P2(k×m) ti Pijn → ( j=1 tj = 1) as n → ∞. Hence genetic algorithms without bit mutation are absorbing Markov chains. For any genetic algorithm without bit mutation, a real valued function f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n 18 is one that contains an infinite number of Markov chains(every chain have different globally optimum value(s) and have Unique chromosomes for purely successful optimization). Step 6. We prove that genetic algorithms with bit mutation are regular chains. Proof. From Step 2, We get unique chromosomes (= 2k ). From Step 3 and Step 4, we Generate all possible combinations of states of unique chromosomes (= 2(k×m) ) and give each state a number. We apply genetic algorithms with bit mutation on each generated state for n-iterations, where n is a large number. We Count for each state with specific number the number of times it appeared, Calculate the probability of each state (P ), where it appeared P = number of times n and Get the stationary distribution ordered by the state number and its probability P . All states in this chain will be ergodic. Now we recall a theorem in [21]. Theorem 4.2. If P is a transition matrix for ergodic chain, then: (1) There is a unique probability vector fixed point t : t = tP . (2) All components of t are positive. (n) (3) If hj is the average number of times the process is in state j in the first n steps, then for any > 0, (n) P (| hj − tj |> ) → 0 as n → ∞ no matter what the starting state is. We conclude and have that all stationary distributions (= 2(k×m) ) are the same unique probability vector fixed point. If P is a transition matrix for genetic algorithms with bit mutation and the probability vector t is a fixed point of the matrix P, then P2(k×m) P2(k×m) j=1 i=1 P2(k×m) ti Pijn → ( j=1 tj = 1) as n → ∞. 19 Hence genetic algorithms with bit mutation are ergodic chains. We take each generated state and its chain, get the position numbers sequences for each unique state ( of 2k×m ) in the chain, starting with index zero in the initial state and check that all sequences of positions are even positions, odd positions, or random positions sequences. Hence genetic algorithms with bit mutation are regular chains. For any genetic algorithm with bit mutation, a real valued function f (x1 , x2 , ..., xn ), where ai ≤ xi ≤ bi for i = 1, 2, ..., n is one that contains an infinite number of Markov chains(every chain have different globally optimum value(s) and have Unique chromosomes for purely successful optimization). Step 7. We describe the partition L of all possible combinations of states of unique chromosomes, prove that L is a countable set and define an ndimensional random variable defined on a sample space L. Let L = {L1 , L2 , . . . ,Lg } be a partition of all possible combinations of states of unique chromosomes. We get Lz for each state in combination and Combine states according their equal Lz ’s ( (x̄1 , x̄2 , . . . , x̄n )’s ) -Taking into account that Lz ’s are unique for each set of states. We prove that L is a countable set. We restrict an arbitrary countable set L to be set of all possible outcomes in the ith trail L(i) z = (x̄1 , x̄2 , . . . , x̄n ) for z = 1, 2, . . . , g for which P (x̄1 , x̄2 , . . . , x̄n ) > 0 as a sample space, where x̄k = and x1k +x2k +...+xmk m = xjk j=1 m Pm for k = 1, 2, . . . , n 20 xjk = measurement of the k th variable on the j th item. Hence L is a countable. Let L be a set in n-space for some n ≥ 1 and if ζ consists of all subsets of L, the probability function P is completely determined on ζ. We have a probability space (L, ζ, P ). Next, we define an n-dimensional random variable defined on a sample space L. Let X be an n-dimensional random variable defined on a sample space L. For each (x̄1 , x̄2 , . . . , x̄n ) in L, X = (X 1 (x̄1 ) = x̄1 , X 2 (x̄2 ) = x̄2 , . . . , X n (x¯n ) = x̄n ) = (x̄1 , x̄2 , . . . , x̄n ). Step 8. We prove that L1 is set of all possible repeated dependent two trails without replacement, define an 2n-dimensional random variable defined on a sample space L1 and describe lumped Markov chains process. Proof. We restrict an arbitrary countable set L1 to be set of all possible repeated dependent two trails without replacement (2) (L(1) e ∩ Ld ) for which (2) P (L(1) e ∩ Ld )≥ 0 as a sample space, where for each L(i) z for z = e or d, where e = 1, 2, . . . , g and d = 1, 2, . . . , g in a common sample space L. Each conditional probability can be defined and can be obtained by the equation (2) P (Ld |L(1) e ) = (1) (2) P (Le ∩Ld ) (1) P (Le ) ≥ 0 such that P (Le ) 6= 0. 21 Hence L1 is countable. Let L1 be a set in 2n-space for some 2n ≥ 2(1) = 2 (a Borel set in 2n-space for some 2n ≥ 2) and if ζ1 consists of all subsets of L1 , the probability function P is completely determined on ζ1 . We have a probability space (L1 , ζ1 , P ). We define an 2n-dimensional random variable defined on a sample space L1 . Consider q ∈ Q where Q is the discrete parameter space of the Markov chain process {Xq , q = 0, 1, 2, . . . }(parameter homogeneous). Let Xq be an (n)-dimensional random variable defined on a sample space L during q (initial value of time parameter), and Xq+1 be an (n)-dimensional random variable defined on a sample space L during q + 1. The possible outcomes for Xq are L1 , L2 , . . ., Lg , and the same holds for Xq+1 . Let X be an 2n-dimensional random variable defined on a sample space L1 . X = (Xq , Xq+1 ) = (Xq = Le , Xq+1 = Ld ) = (Le , Ld ) = (Xt1 , Xt2 ) = ((X t1 1 , X t1 2 , . . ., X t1 n ),(X t2 1 , X t2 2 , . . ., X t2 n )) for each (Le , Ld ) in L1 , where Xv = Lz = Xt 22 = (X t1 (x̄t1 ) = x̄t1 , X t2 (x̄t2 ) = x̄t2 , . . . , X tn (x̄tn ) = x̄tn ) = (x̄t1 , x̄t2 , . . . , x̄tn ) for v = q or q + 1, z = e or d and t = t1 or t2 for each (x̄t1 , x̄t2 , . . . , x̄tn ) in L. (Xv are n-dimensional random variables defined on a common sample space L). Next, we describe the lumped Markov chains process. A Markov chains requires a finite collection of states, denoted by L = {L1 , L2 ,. . ., Lg }, an g × g probability matrix P called a transition matrix and a probability vector (0) (0) p(0) = ( P (L1 ), P (L2 ), . . ., P (Lg ) ) = (p1 , p2 , . . ., p(0) g ) called the initial probability vector. We can also interpret p(0) as stationary distribution. Step 9. By Theorems for the mean and covariance of the sampling distribution of X and central limit Theorem, we conclude and have the following modified forms of the Theorems for an n-dimensional random variable X defined on a sample space L. For the mean and covariance Theorems of the sampling distribution of X and central limit Theorem (see [20]). The modified form of the mean theorem If D1 , D2 , . . ., Dh represent all possible simple random samples with replacement from a common joint probability mass function P (x1 , x2 , ..., xn ) (the parent population, whatever its form, have a mean µ = (µ1 , µ2 , ..., µn )0 23 and a finite covariance Σ1 ), then an n-dimensional density for the random vector X = (X 1 , X 2 , ..., X n )0 (sampling distribution of X) r(X) is centered around the population mean, regardless of sample size (E(X) = µ). The modified form of the covariance theorem. Each covariance σik , i, k = 1, 2, . . . , n of the distribution of X decreases with increasing sample size; that is, the distribution of X becomes more concentrated around the population mean as the sample size gets larger (a covariance of X is denoted by cov(X) = Σ1 , m where m (number of chromosomes) is a sample size). For practical problem, often m = 100 = chromosomes and bit = k > 50; thus there may be more than 25000 possible states in the chain. The distribution of X becomes more symmetrical as the sample size gets larger and is approximately Nn (µ, Σ = Σ1 ) m for large sample size. An n-dimensional normal density for the random vector X has the form f (x) = 1 n 1 (2Π) 2 |Σ| 2 e x−µ)0 Σ−1 (x−µ) −( 2 where −∞ < x̄i < ∞, i = 1, 2, . . . , n. The modified form of central limit Theorem.Let X1 , X2 , . . ., Xm be independent observations from any population with mean µ and finite covariance Σ1 . Then X has an approximate Nn (µ, Σm1 ) 24 distribution for large sample sizes. Here m should also be large relative to n. The approximation provided by the central limit theorem applies to discrete, as well as continuous, multivariate populations. Theorems for the mean and covariance of the sampling distribution of X and central limit Theorem apply to sampling of finite populations if the sampling fraction is 5 percent or smaller (the sample size m is small relative to the population size M ; that is, with fraction small). Step 10. We prove that genetic algorithms with and without bit mutation are lumped Markov chains. Proof. From Step 2, we get unique chromosomes (= 2k ). From Step 3 and Step 4, we Generate all possible combinations of states of unique chromosomes(= 2(k×m) ) and give each state a number. From Step 7, we get Lz for each state in combination and Combine states according to their equal Lz ’s - Taking into account that Lz ’s are unique for each set of states. Let each set of states to be a separate new state. From Step 7 and Step 8, we identify new state space of Lz ’s based on set of states of unique Lz ’s, define L = {L1 , L2 , . . . , Lg } and define a random variable X on state space L. From Step 9, we get sampling distribution of X (Nn (µ, Σm1 = Σ)), get expectation of sampling distribution E(X) = µ, where µ is the expectation of parent population and get covariance matrix of X (= Σ = Σ1 ), m where m = sample size and Σ1 = covariance matrix of parent population. We conclude and have that the distribution of Xq is Nn (µ, Σ), 25 and the same holds for Xq+1 . Let X = (Xq , Xq+1 )0 be distributed as N2n (µ? , Σ? ) with µ? = (µ = µt1 ,µ = µt2 )0 , ? Σ = Σt1 t1 = Σ Σt1 t2 Σt2 t1 Σt2 t2 = Σ ! , and | Σt2 t2 | > 0. Then the conditional distribution of Xq+1 , given that Xq = L? , is normal and has Mean = µ? = µ + Σt1 t2 Σ−1 (L? − µ) and Covariance = Σ? = Σ − Σt1 t2 Σ−1 Σt2 t1 . We select one state from U , get L? for the selected state and replicate the selected state 2(k×m) times. We apply one of the following algorithms: (1) Genetic algorithms without bit mutation. (2) Genetic algorithms with bit mutation. for one transition (iteration) only on the replicated states to produce new states. On the generated new states, we compute Lz . On the set of Lz ’s computed, we compute conditional normal distribution N (µ? , Σ? ). 26 We Compute Σt1 t2 , where µ? = µ + Σt1 t2 Σ−1 (L? − µ). By a similar argument, we compute Σt1 t2 for all states of L, check that Σt1 t2 is the same and check that Σ? is the same. Hence genetic algorithms with and without bit mutation are lumped Markov chains. We compute all possible conditional normal distributions of transition matrix and can also interpret the distribution of X as stationary normal distribution. On the basis of Steps 1-10, we complete the proof of Theorem 3.1. 5. Proposed algorithms We prepared programs by using MATLAB 7.5. Afterwards, three algorithms are designed and executed to observe the characteristics and to know the behavior of the following: 5.1. Genetic algorithms (GAs) without bit mutation We named the proposed algorithm absorbing optimization analysis (AOA), the basic steps of the AOA algorithm are as follows: 1. Input number of bits k. 2. Get unique chromosomes = 2k . 3. Input number of chromosomes m. 4. Get number of states = 2(k×m) . 5. Generate all possible combinations of states of unique chromosomes ( = also 2(k×m) ). 6. Give each state a number. 7. Apply GAs without bit mutation on each generated state for n-iterations, where n is a large number. 8. count for each state with specific number the number of times it appeared. 9. Calculate the probability of each state (P ), where it appeared . P = number of times n 27 10. Get the stationary distribution ordered by the state number and its probability P . 5.2. Genetic algorithms (GAs) with bit mutation We named the proposed algorithm regular optimization analysis (ROA), the basic steps of the ROA algorithm are as follows: 1. Input number of bits k. 2. Get unique chromosomes = 2k . 3. Input number of chromosomes m. 4. Get number of states = 2(k×m) . 5. Generate all possible combinations of states of unique chromosomes (= also 2(k×m) ). 6. Give each state a number. 7. Pick one state randomly. 8. Apply GAs with bit mutation on the state for n-iterations, where n is a large number. 9. Count for each state with specific number the number of times it appeared. 10. Calculate the probability of each state (P ), where it appeared . P = number of times n 11. Get the stationary distribution ordered by the state number and its probability P . 12. Taking the randomly chosen state and its chain. Get the position numbers sequences for each unique state ( of 2k×m ) in the chain, starting with index zero in the initial state. Check if all sequences of positions are either (A) Even positions or (B) Odd positions only then the chain is cyclic else if also random positions sequences appear then the chain is regular. 5.3. Genetic algorithms with and without bit mutation (as lumped Markov chains) We named the proposed algorithm Central-Markovian optimization analysis(CMOA),the basic steps of the CMOA algorithm are as follows: 1. Input number of bits k. 2. Get unique chromosomes = 2k . 3. Input number of chromosomes m. 28 4. Get number of states = 2(k×m) . 5. Get all possible combinations of states of unique chromosomes = U (= also 2(k×m) ). 6. Get Lz for each state in combination. 7. Combine states according to their equal Lz ’s - Taking into account that Lz ’s are unique for each set of states. 8. Let each set of states to be a separate new state. 9. Identify new state space of Lz ’s based on set of states of unique Lz ’s. 10. Define L = {L1 , L2 , . . . , Lg }. 11. Define a random variable X on state space L. 12. Get sampling distribution of X (Nn (µ, Σm1 = Σ)). 13. Get expectation of sampling distribution E(X) = µ, where µ is the expectation of parent population. 14. Get covariance matrix of X (= Σ = Σm1 ), where m = sample size and Σ1 = covariance matrix of parent population. 15. Let(Xq , Xq+1 )0 be distributed as (N2n (µ? , Σ? )) with µ? = (µ = µt1 ,µ = µt2 )0 , ? Σ = Σt1 t1 = Σ Σt1 t2 Σt2 t1 Σt2 t2 = Σ ! , and | Σt2 t2 |> 0, where q is the initial value of time parameter Q of the Markov chain process {Xq , q = 0, 1, 2, . . .} that the possible outcomes for Xq are L1 , L2 , . . . , Lg and the same holds for Xq+1 . 16. Get E(Xq ) = µ. 17. Get covariance matrix of Xq = Σ. 18. Get E(Xq+1 ) = µ. 19. Get covariance matrix of Xq+1 = Σ. 20. Group states on U such that (1) (2) (3) (4) states are globally optimal (all members of states are identical). All or Some members of states have globally optimum values. states are not globally optimal and all members of states are identical. All or Some members of states have not globally optimum values. 21. Select one state randomly from each group on U . 22. Get L? for the selected states. 23. Replicate each selected state (from each group) 2(k×m) times. 29 24. Apply one of the following algorithms (1) GAs without bit mutation ((2) and (4) only from (20)). (2) GAs with bit mutation. for one transition (iteration) only on each replicated selected state to produce new states. 25. On the generated new states, compute Lz . 26. On the set of Lz ’s computed, compute conditional normal distribution N (µ? , Σ? ). 27. Compute Σt1 t2 , where µ? = µ + Σt1 t2 Σ−1 (L? − µ). 28. check that Σ? is the same for all four groups. 29. If Σt1 t2 for the four groups of states are equal then the process is Markov chain. 30. Substitute all possible Lz ’s to compute all conditional expected values µ? . As a result this computes all possible conditional normal distributions and The distribution of X is stationary multivariate normal distribution. 6. Numerical results and discussion 6.1. A numerical example for GAs without bit mutation Consider the following: f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2] k = 5 bits, m = 2 chromosomes, n = 100 iterations, probability of crossover = 0.6, probability of mutation = 0.9 Unique chromosomes = {00000 = -1.000000, 00001 = -0.903226, 00010 = -0.806452, 00011 = -0.709677, 00100 = -0.612903, 00101 = -0.516129, 00110 = -0.419355, 00111 = -0.322581, 01000 = -0.225806, 01001 = -0.129032, 01010 = -0.032258, 01011 = 0.064516, 01100 = 0.161290, 01101 = 0.258065, 01110 = 0.354839, 01111 = 0.451613, 10000 = 0.548387, 10001 = 0.645161, 10010 = 0.741935, 10011 = 0.838710, 10100 = 0.935484, 10101 = 1.032258, 10110 = 1.129032, 10111 = 1.225806, 11000 = 1.322581, 11001 = 1.419355, 11010 = 1.516129, 11011 = 1.612903, 11100 = 1.709677, 11101 = 1.806452, 11110 = 1.903226, 11111 = 2.000000} Globally optimum value = 1.225806 30 All possible combinations of states of unique chromosomes = { 0:(00000, 00000), 1:(00000, 00001), . . . , 1023:(11111, 11111) } n = 100 iterations and all stationary distributions ordered by the state number 0, 0, 0, 0, 0, 0, 0, . . . ,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 Stationary distribution (0) = (1) 1, 3, 3, 3, 3, 3,3, 3, 3, 3, . . . , 3, 3, 3, 3, 3, 3, 3, 3, 3 Stationary distribution (1, 3) = (0,1) 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, . . . , 0, 0, 0, 0, 0, 0, 0, 0, 0 Stationary distribution (0, 2) = (1, 0) 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, . . . , 3, 3, 3, 3, 3, 3, 3, 3, 3 Stationary distribution (3) = (1) .. . 1021, 1022, 1023, 1023, 1023, . . . ,1023, 1023, 1023, 1023, 1023 Stationary distribution (1021, 1022, 1023) = (0,0, 1) 1022,1021, 1020, 1020, 1020, . . . ,1020, 1020, 1020, 1020, 1020 Stationary distribution (1020, 1021, 1022) = (1, 0, 0) 1023, 1023, 1023, 1023, 1023, . . . ,1023, 1023, 1023, 1023, 1023 Stationary distribution (1023) = (1) We conclude that all stationary distributions (= 2(k×m) ) have positive com- 31 ponent for any absorbing state (= 1), and zero for the transient states. Number of all absorbing state ( = 2k ). 6.2. A numerical example for GAs with bit mutation Consider the following: f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2] k = 5, m = 2, n = 100000 iterations, probability of crossover = 0.6, probability of mutation = 0.9 Number of iteration 0 1 2 .. . Number of state 884 141 1010 .. . 99997 99998 99999 495 48 975 Table(1) n = 100000 Ordered number of state Probability 0 0.029850 1 0.004520 2 0.004210 .. .. . . 1021 0.004730 1022 0.004050 1023 0.031120 Table(2)stationary distribution of Table(1) Classification of chain (regular) 884 to 141(neither even nor odd) 1-2204-4267-5988-7086-7214-7495-8761-10647-11173-11231-11972-12666-13648.. . -49413-49480-50639-50834-55098-56852-57766-58204-58373-59907-60346-6053060898-63871-64176-64215-65166-70131-71202-71764-72192-73921-77079-7813778804-78824-81185-82354-84321-84569-85216-87825-90839-90853-92645-9688897918-99262 32 .. . 884 to 681(even) 71370-90980 884 to 350(odd) 73663 884 to 617(neither even nor odd) 75678-98597 884 to 676(odd) 76713 884 to 361(odd) 91709 884 to 358(odd) 92055 884 to 667(neither even nor odd) 96602-98115 Consider the following: f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2] k = 5, m = 2, n = 150000 iterations, probability of crossover = 0.6, probability of mutation = 0.9 Number of iteration 0 1 2 .. . Number of state 393 252 803 .. . 149997 149998 149999 64 831 192 Table(3) n = 150000 Ordered number of state Probability 0 0.029433 1 0.004113 2 0.004500 .. .. . . 1021 0.004520 1022 0.004413 1023 0.030880 33 Table(4) stationary distribution of Table(3) Classification of chain (regular) 393 to 252(neither even nor odd) 1-147-215-327-342-401-403-405-541-777-793-795-852-1108-1357-1420-1422-14241426-1489-2436-2531.. . -124475-124479-124506-124717-124927-124978-125103-125364-125366-125368-125620125864-125866-125868-126277-126317-126438-127057-127339-127441-127769-127838128318-128320-128322-128328-128520-128773-129057-129687-130131-130535-130537130539-130705-130707-130935-130939-131010-131475-131484-131488-131833-132027132057-132160-132508-132510-132512-132689-132724-132728-132736-132843-132845132996-133117-133241-133258-133415-133419-133445-133586-133706-133836-133838133840-134232-134436-134630-134682-134931-134991-135087-135148-135150-135339135542-135666-135777-135979-136149-136191-136572-136788-136790-136794-136915137009-137011-137103-137154-137188-137954-138052-138122-138366-138852-138952139177-139623-139717-139852-140277-140285-140713-140742-141403-142003-142280142282-142356-142447-142932-142955-143073-143075-143320-143322-144006-144010144677-145183-145210-145497-145706-145778-145782-145786-145814-145920-146031146033-146093-146095-146365-146494-146986-147008-147252-147326-147548-147627147976-147990-148034-148036-148344-148581-148799-148801-148880-149109-149160149344-149661-149976 .. . 393 to 358(neither even nor odd) 63841-108342 393 to 681(even) 65872-67640 393 to 661(odd) 73529 393 to 618(neither even nor odd) 81349-124315-140962 393 to 666(even) 104718 From Tables 2 and 4, we obtain the same unique stationary distribution and obtain the regular chain. 6.3. A numerical example for GAs without bit mutation (as lumped Markov chains) Consider the following: 34 f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2] k = 5, m = 2, probability of crossover = 0.6, probability of mutation = 0.9 Unique chromosomes = {00000 = -1.000000, 00001 = -0.903226, 00010 = -0.806452, 00011 = -0.709677, 00100 = -0.612903, 00101 = -0.516129, 00110 = -0.419355, 00111 = -0.322581, 01000 = -0.225806, 01001 = -0.129032, 01010 = -0.032258, 01011 = 0.064516, 01100 = 0.161290, 01101 = 0.258065, 01110 = 0.354839, 01111 = 0.451613, 10000 = 0.548387, 10001 = 0.645161, 10010 = 0.741935, 10011 = 0.838710, 10100 = 0.935484, 10101 = 1.032258, 10110 = 1.129032, 10111 = 1.225806, 11000 = 1.322581, 11001 = 1.419355, 11010 = 1.516129, 11011 = 1.612903, 11100 = 1.709677, 11101 = 1.806452, 11110 = 1.903226, 11111 = 2.000000} Globally optimum value = 1.225806 We will use the notation x̄x̄value = Value of x̄. All possible combinations of states of unique chromosomes = { {(00000, 00000)} = -1.000000 = x̄−1.000000 , {(00000, 00001), (00001, 00000)} = -0.951613 = x̄−0.951613 , { (00001, 00001), (00000, 00010), (00010, 00000)} = -0.903226 = x̄−0.903226 , . . . , {(11111, 11111)} = 2.000000 = x̄2.000000 } Sampling distribution of X (N1 (0.500000, 0.159355))≡ stationary distribution. Covariance = 0.160546 for the two groups of states are equal then the process is lumped Markov chain Conditional variance = 0.000022 is the same for the two groups. All possible conditional normal distributions of transition matrix {x̄−1.000000 :N1 (-1.011202,0.000022), x̄−0.951613 :N1 (-0.962453,0.000022), x̄−0.903226 :N1 (-0.913705,0.000022), . . ., 35 x̄2.000000 :N1 (2.011202,0.000022)} 6.4. A numerical example for GAs with bit mutation (as lumped Markov chains) Consider the following: f (x) = x · sin(10Π · x) + 1, x ∈ [−1, 2] k = 5, m = 2, probability of crossover = 0.6, probability of mutation = 0.9 All possible combinations of states of unique chromosomes = { {(00000, 00000)} = -1.000000 = x̄−1.000000 , { (00000, 00001), (00001, 00000)} = -0.951613 = x̄−0.951613 , { (00001, 00001), (00000, 00010), (00010, 00000)} = -0.903226 = x̄−0.903226 , . . ., { (11111, 11111) }= 2.000000 = x̄2.000000 } Sampling distribution of X (N1 (0.500000, 0.159355))≡ stationary distribution. Covariance = -0.141386 for the four groups of states are equal then the process is lumped Markov chain Conditional variance = 0.007336 is the same for all four groups All possible conditional normal distributions of transition matrix {x̄−1.000000 :N1 (1.830859,0.007336), x̄−0.951613 :N1 (1.787928,0.007336), x̄−0.903226 :N1 (1.744997,0.007336), . . ., x̄2.000000 :N1 (-0.830859,0.007336)} For genetic algorithms with and without bit mutation the stationary distributions are the same (see Table(5) of a bivariate probability distribution for 36 two discrete random variables Xq , Xq+1 and see proof of the main Theorem, where P (Dr ) 6= 0 ∀r). A bivariate probability distribution for two discrete random variables is illustrated in Table(5). Xq \ Xq+1 D1 D2 · D1 P (D1 ∩D1 ) P (D2 ∩D1 ) · D2 P (D1 ∩D2 ) P (D2 ∩D2 ) · · · Dh Total · · P (Dh ∩D1 ) P (D1 ) · · P (Dh ∩D2 ) P (D2 ) ... ... ... Dh P (D1 ∩Dh ) P (D2 ∩Dh ) · Total P (D1 ) P (D2 ) · · · . . . P (Dh ∩Dh ) ... P (Dh ) · · P (Dh ) 1 ... Table (5) The two marginal probability distributions are found in the margins of Table(5)-that for Xq in the rightmost column and that for Xq+1 in the bottom row. The marginal probabilities are obtained by summing over rows and columns. 7. Discussion In this paper, the main result is the unified theorem of genetic algorithms toward a new philosophy of machine intelligence. The unified theorem will help us to propose unique chromosomes for a purely successful optimization of these algorithms. Through unified Markov approach, we obtain purely empirical analysis conclusions and obtain purely theoretical analysis for a classification and Stationary distributions of chains. Through unified lumped Markov approach, we obtain purely empirical analysis conclusions and obtain purely theoretical analysis for all possible conditional multivariate normal distributions of transition probability matrices and stationary multivariate normal distributions of chains. References [1] A.E. Eiben, E.H.L. Aarts, K.M. Van Hee, Global Convergence of Genetic Algorithms: A Markov Chain Analysis, In: H.-P. Schwefel, R. Männer(EDs.), Parallel Problem Solving from Nature, Berlin: Springer, (1991), pp. 4-12. 37 [2] U.H. Abou El-Enien, Graph Drawing algorithms for Markov Chains, M.Sc. diss., Egypt, ISSR, Cairo University, (2006). [3] T. Bäck, F. Hoffmeister, H.-P. Schwefel,A Survey of Evolution Strategies, In Proc. of the Fourth Intern. Conf. on Genetic Algorithms, edited by R. K. Belew and L. B. Booker. San Mateo, CA: Morgan Kaufman, (1991) 2-9. [4] U.N.Bhat, Elements of Applied Stochastic Processes, John Wiley and Sons, New York, 1972. [5] R. Poli, W. B. Langdon, M. Clerc, C. R. Stephens, Continuous Optimisation Theory Made Easy? Finite-Element Models of Evolutionary Strategies, Genetic Algorithms and Particle Swarm Optimizers, Foundations of Genetic Algorithms, 9th International Workshop, edited by Christopher R. Stephens, M. Toussaint, L. Darrell Whitley, P. F. Stadler, Mexico City, Mexico, Lecture Notes in Computer Science vol. 4436 Springer, (2007)165-193. [6] T.E. Davis, J.C. Principe, A Markov Chain Framework for the simple Genetic Algorithm, Evolutionary Computation, Vol. 1:3, (1993) 269-288. [7] K.A. de Jong, Genetic Algorithms: A 25 Year Perspective, Computational intelligence: Imitating Life, IEEE Press,(1994) 125-134. [8] K. A. De Jong, William M. Spears, D. F. Gordon, Using Markov Chains to Analyze GAFOs, Proceedings of the Third Workshop on Foundations of Genetic Algorithms, edited by L. Darrell Whitley and Michael D. Vose, Estes Park, Colorado, USA, Morgan Kaufmann ,(1995)115-137. [9] D.B. Fogel, Asymptotic Convergence Properties of Genetic Algorithms and Evolutionary Programming: Analysis and Experiments, Cybernetics and Systems, Vol. 25:3, (1994)389-407. [10] D.B. Fogel, Evolutionary Computation : Towards a New Philosophy of Machine Intelligence, IEEE Press, New York, 1995. [11] D.B. Fogel, Evolving Artificial Intelligence, Ph.D. diss., UCSD, (1992). [12] D. B. Fogel, System Identification through Simulated Evolution: A Machine Learning Approach to Modeling. Needham, MA: Ginn Press(1991). [13] A.S. Fraser, Comments on Mathematical Challenges to the Neo-Darwinian Concept of Evolution, edited by P.S. Moorhead, M.M. Kaplan, Philadelphia: Wistar Institute Press,(1967) p. 107. [14] A.S. Fraser, Simulation of Genetic Systems by Automatic Digital Computers, I. Introduction, Australian J. of Biol. Sci., Vol. 10, (1957)484-491. [15] A.S. Fraser, The Evolution of Purposive Behavior, In Purposive Systems, edited by H. von Foerster, J. D. White, L. J. Peterson, and J. K. Russell, Washington DC: Spartan Books, ,1968, pp. 15-23. [16] T. Gedeon, C. Hayes, R. Swanson, Genericity of the Fixed Point Set for the Infinite Population Genetic Algorithm, Foundations of Genetic Algorithms, 9th International Workshop, edited by Christopher R. Stephens, M. Toussaint, L. Darrell Whitley, P. F. Stadler, Lecture Notes in Computer Science vol. 4436 Springer ,Mexico City, Mexico, (2007) 97-109. 38 [17] M. Goldman, Introduction to Probability and Statistics, Harcourt, Brace and World, New York, 1970. [18] J.H. Holland, Adaptive Plans Optimal for Payoff-Only Environments, In Proc. of the Second Hawaii Inter. Conf. on System Sciences, (1969) 917-920. [19] J. Horn, Finite Markov Chain Analysis of Genetic Algorithms with Niching, Proceedings of the 5th International Conference on Genetic Algorithms,edited by S. Forrest, Urbana-Champaign, IL, USA, Morgan Kaufmann, (1993)110117. [20] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, fourth ed., Prentice Hall, New Jersey, 1998. [21] J.G.Kemeny, J.L.Snell, Finite Markov Chains, Springer-Verlag, New York, 1976. [22] X. Li, W. Luo, X. Yao, Theoretical foundations of evolutionary computation,Genetic Programming and Evolvable Machines ,Springer Netherlands, Volume 9, Number 2, (2008)107-108. [23] B. Mitavskiy, J.E. Rowe, A. H. Wright, L. M. Schmitt, Quotients of Markov chains and asymptotic properties of the stationary distribution of the Markov chain associated to an evolutionary algorithm, Genetic Programming and Evolvable Machines ,Springer Netherlands, Volume 9, Number 2, (2008)109-123. [24] G. Rudolph, Convergence Analysis of Canonical Genetic Algorithms, IEEE Trans. on Neural Networks, Vol. 5:1, (1994) 96-101. [25] H.-P. Schwefel, Numerical Optimization of Computer Models, Chichester, UK: John Wiley(1981).