Download Diversity Loss in General Estimation of Distribution Algorithms

Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2nd EDA Seminar Abstract  A class of EDAs on which universal results on the rate of diversity loss can be derived.  The class (SML-EDA) requires two restrictions.  In each generation, the new probability model is built using only sampled from the current probability model.  Maximum likelihood is used to set model parameters.  The algorithm will never find the optimum unless the population size grows exponentially. EDA  Initialize the EDA probability model (Start with a random population of M vectors.) Repeat Select N vectors using a selection method Learn a new probability model from the selected population Sample M vectors from the probability model Until stopping criteria are satisfied SML-EDA  A restricted class of EDAs needs 3 assumptions.  1. Data in generation t can only affect data in generation t+1 through the probability model.  2. The parameters of the estimated model are chosen using maximum likelihood.  3. The sample size M and the size of the population N are of a constant ratio independent of the number of variables L.  SML-EDA (Simple, Maximum-Likelihood EDA)  The class of EDAs for which assumptions 1 and 2 hold are called SML-EDAs.  This class of EDAs includes BOA, MIMIC, FDA, UMDA, etc. Some Definitions for diversity loss  The Empirical frequency (the component i has the value A through the population) iA  1 N   ( x  i  A)  Population size  Population member One diversity measure 1 a a v  (1    i i ) i | A| a the value is fixates, 0 random population, maximum value  The trace of the empirical co-variance matrix 1 Cij  [  ( xi  a) ( x j  a)     ( xi  a)   ( x j  a) ]  | A| a Expectation that 2 components both take the value A Expectation that they take value A independently. Diversity Loss on a Flat Fitness Landscape  Theorem 1  For EDAs in class SML-EDA on a flat landscape, the expected value of v is reduced in each generation by the following. 1  vt  vt 1  (1  ) N  Because the parameters are set by ML, empirical variance is reduced by a factor (1-1/N) from the parent population.  Universal “expected diversity loss” for all SML-EDA t 1   vt  v0  1    N  decays with characteristic time approximately equal to the population size for large N. A universal bound for the minimum population size in the needle problem  Needle in a haystack problem  There is one special state (the needle), which has a high fitness value, and all others have the same low fitness value.  Probability that the needle is never sampled. prob(TN  )  B( N , L) The time that the needle is first sampled.  Theorem 2 2 L  In the limit that L   such that N L | A |  0 B( N , L)  1  O(| A | L / 2 ) for any EDA in SML-EDA searching on the Needle problem.  The population size must grow at least as fast as for the optimum to be found. | A |L / L  Proof of theorem 2  Lemma 1  Let t* be defined as t*   L 2 log(| A |)  log( LNv0 ) log(1  1/ N ) If the needle has not been found after a time t > t* , the probability that the needle will never be found is greater that 1 – ε, where  | A | L / 2 . prob(TN   | TN  t*)  1 | A | L / 2  Proof of Lemma 1  We can write the probability about the diversity loss with the formula that the variance is small.  v(r )    prob v(t )   1       Choosing t* to be large enough so that vt   vt    1 1 1    N N if vt is so small, there must be fixation at every component except possibly one component. If L-1 components are fixed, the needle will only be sampled if they are fixed at values found in the needle. prob(TN   | TN  t*, vt  1/ N  1/ N 2 )  1 | A | ( L 1) We are not certain that vt appropriately small, it is just probable so. Take  | A | L / 2  t* is calculated by solving simultaneous equations.  Lemma 2  The probability that the needle is not found after t* steps obeys N2 prob(TN  t*)  1  | A |L L  log(| A |)  log( LN )  2   Proof – The probability that SML-EDA does not find the needle in time t* prob(TN  t*)  (1 | A | L )t*N The Prob. of not finding the needle the result is found by putting into this equation the value for t*  So, the probability of never finding the needle can be decomposed into prob(TN  )  prob(TN   | TN  t*) prob(TN  t*)  Combining Lemma 1, 2 gives the following   ( L / 2) 2 L  L prob(TN  )  1 | A |  N | A |  log(| A |)  log( LN )   ... 2   If in the limit L   , N grows sufficiently slowly that the third term vanishes, then the probability of never finding the needle will go to 1.  | A |L / 2   Thus, if N  o   , as L   , the needle will never be  L  found. Expected Runtime for the Needle Problem and the Limits of Universality  Corollary 1.  If the needle is found, the time to find it is bounded above by t* with probability 1 | A | L / 2  Convergence conditions will not be universal, but will be particular to the EDA.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Diversity Loss in General Estimation of Distribution Algorithms