Download Diversity Loss in General Estimation of Distribution Algorithms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Diversity Loss
in General Estimation of Distribution
Algorithms
J. L. Shapiro
PPSN (Parallel Problem Solving From Nature) ’06
BISCuit 2nd EDA Seminar
Abstract

A class of EDAs on which universal results on the rate of
diversity loss can be derived.
 The class (SML-EDA) requires two restrictions.
 In
each generation, the new probability model is built using only
sampled from the current probability model.
 Maximum likelihood is used to set model parameters.

The algorithm will never find the optimum unless the
population size grows exponentially.
EDA

Initialize the EDA probability model
(Start with a random population of M vectors.)
Repeat
Select N vectors using a selection method
Learn a new probability model
from the selected population
Sample M vectors from the probability model
Until stopping criteria are satisfied
SML-EDA

A restricted class of EDAs needs 3 assumptions.
 1. Data in generation t can only affect data in generation t+1
through the probability model.
 2. The parameters of the estimated model are chosen using
maximum likelihood.
 3. The sample size M and the size of the population N are of a
constant ratio independent of the number of variables L.

SML-EDA (Simple, Maximum-Likelihood EDA)
 The class of EDAs for which assumptions 1 and 2 hold are
called SML-EDAs.
 This class of EDAs includes BOA, MIMIC, FDA, UMDA, etc.
Some Definitions for diversity loss

The Empirical frequency
(the component i has the value A through the population)
iA 
1
N


(
x
 i  A)

Population size

Population member
One diversity measure
1
a
a
v

(1



i
i )
i | A| a
the value is fixates,
0
random population, maximum value

The trace of the empirical co-variance matrix
1
Cij 
[  ( xi  a) ( x j  a)     ( xi  a)   ( x j  a) ]

| A| a
Expectation that 2 components
both take the value A
Expectation that they take value A
independently.
Diversity Loss
on a Flat Fitness Landscape

Theorem 1
 For EDAs in class SML-EDA on a flat landscape, the expected
value of v is reduced in each generation by the following.
1
 vt  vt 1  (1  )
N
 Because the parameters are set by ML, empirical variance is
reduced by a factor (1-1/N) from the parent population.

Universal “expected diversity loss” for all SML-EDA
t
1

 vt  v0  1  
 N
 decays with characteristic time approximately equal to the
population size for large N.
A universal bound for the minimum
population size in the needle problem

Needle in a haystack problem
 There is one special state (the needle),
which has a high fitness value,
and all others have the same low fitness value.

Probability that the needle is never sampled.
prob(TN  )  B( N , L)
The time that the needle is first sampled.

Theorem 2
2
L
 In the limit that L   such that N L | A |  0
B( N , L)  1  O(| A | L / 2 )
for any EDA in SML-EDA searching on the Needle problem.
 The population size must grow at least as fast as
for the optimum to be found.
| A |L / L

Proof of theorem 2
 Lemma 1
 Let
t* be defined as
t*  
L
2
log(| A |)  log( LNv0 )
log(1  1/ N )
If the needle has not been found after a time t > t* , the probability
that the needle will never be found is greater that 1 – ε,
where  | A | L / 2 .
prob(TN   | TN  t*)  1 | A | L / 2
 Proof of Lemma 1
 We
can write the probability about the diversity loss
with the formula that the variance is small.
 v(r )  

prob v(t ) 
 1 




 Choosing
t* to be large enough so that
vt 
 vt 


1
1
1



N N
if vt is so small, there must be fixation at every component except
possibly one component. If L-1 components are fixed, the needle
will only be sampled if they are fixed at values found in the needle.
prob(TN   | TN  t*, vt  1/ N  1/ N 2 )  1 | A | ( L 1)
We are not certain that vt appropriately small, it is just probable so.
Take  | A | L / 2
 t* is calculated by solving simultaneous equations.
 Lemma 2
 The
probability that the needle is not found after t* steps obeys
N2
prob(TN  t*)  1 
| A |L
L

log(|
A
|)

log(
LN
)
 2

 Proof
– The probability that SML-EDA does not find the needle in time t*
prob(TN  t*)  (1 | A | L )t*N
The Prob. of not finding the needle
the result is found by putting into this equation the value for t*
 So, the probability of never finding the needle can be
decomposed into
prob(TN  )  prob(TN   | TN  t*) prob(TN  t*)
 Combining Lemma 1, 2 gives the following

 ( L / 2)
2
L  L
prob(TN  )  1 | A |
 N | A |  log(| A |)  log( LN )   ...
2

 If in the limit L   , N grows sufficiently slowly that the third
term vanishes, then the probability of never finding the needle
will go to 1.
 | A |L / 2 
 Thus, if N  o 
 , as L   , the needle will never be
 L 
found.
Expected Runtime for the Needle
Problem and the Limits of Universality

Corollary 1.
 If the needle is found, the time to find it is bounded above by t*
with probability 1 | A | L / 2

Convergence conditions will not be universal, but will be
particular to the EDA.