Download GA-Hard Problems - People

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene expression programming wikipedia , lookup

Transcript
Category
-
Genetic Algorithms
GA-Hard Problems
Bart Rylander
Initiative for Bioinformatics and
Evolutionary Studies (IBEST)
Department of Computer Science
University of Idaho
Moscow, Idaho 83844
[email protected]
360-887-4424
James Foster
Initiative for Bioinformatics and
Evolutionary Studies (IBEST)
Department of Computer Science
University of Idaho
Moscow, Idaho 83844
[email protected]
208-885-7062
Abstract
Genetic algorithm (GA) researchers have long understood the importance of efficiency. Knowing
when and where to best use a GA seems a requirement to effective application as well as
tantamount to a clear understanding of the underlying theory. Past efforts in this regard largely
concentrate on “GA-hardness” or the informal notion of what is difficult for a GA to do. These
efforts have produced fruitful results in the analysis of representations. It should not be construed
however that features of a representation are necessarily features of a problem. Knowing what
features cause a representation to be GA-hard may not imply that the underlying problem is GAhard. It is currently known that not all representations for a given problem exhibit the same degree
of epistasis or deception. Further, because previous research has concentrated on representations
it has been difficult to achieve even a definition of what a GA-hard problem is, beyond the
notional idea of it being something that is difficult for a GA to do. In this paper we define what it
means for a problem to be GA-hard. We provide a method for analyzing the GA-complexity of
the underlying problem in a way that can be related to classical complexity. We establish how to
perform a reduction of one GA problem to another. Finally, we identify future directions for
potential research.
1
INTRODUCTION
The GA is a biologically inspired search method that seeks to converge to a solution using an
evolutionary process. It is typically used when a more direct form of programming cannot be
found. By exploiting a method that can adapt, interesting problems may be examined. GAs have
successfully been applied to problems from such diverse fields as economics, game theory,
genetics, and artificial intelligence, to name a few.
At the very inception of the GA John Holland understood the importance of efficiency. He wrote
that though enumerative methods guaranteed a solution, they were a false lead. “The flaw, and it
is a fatal one, asserts itself when we begin to ask, ‘How long is eventually?’ To get some feeling
for the answer we need only look back ... that very restricted system there were 10100 structures ...
In most cases of real interest, the number of possible structures vastly exceeds this number ... If
1012 structures could be tried every second (the fastest computers proposed to date could not even
add at this rate), it would take a year to test about 3*1019 structures, or a time vastly exceeding the
estimated age of the universe to test 10100 structures." [Holland, 1975] Clearly, then, efficiency is
crucial. Whether a GA can be applied to a problem and its various instances efficiently is the
question that needs to be answered.
We need to carefully distinguish between a problem, and a problem instance. We will usually
refer to the latter simply as an “instance”. A problem is a mapping from problem instances onto
solutions. For example, consider the Maximum Clique (MC) problem (see Definition 2), which is
to find the largest complete subgraph H of a given graph G. An instance of MC would be a
particular graph G, whereas the problem MC is essentially the set of all pairs (G, H) where H is a
maximal clique in G.
Our central contention is that most current approaches to “GA hardness” actually explain why a
GA is unlikely to converge quickly to a solution on a particular problem instance. Landscape
analysis, for example, examines the fitness landscape of a particular problem instance. Holland’s
example above only points out the difficulty of searching an immense space of possible solutions
for a particular instance. If, however, we are to understand the a priori applicability of GAs to
problems, we need a more general approach.
Instance based analysis is also very sensitive to representational issues. Most theoretical research
in this area examines features of representations that cause GAs to converge more slowly. For
example, Whitley showed that the only challenging problems were deceptive [Whitley, 1991].
However, he also realized that one can avoid deception by changing the representation, since
“remapping many of the strings so that the “global winner” is moved closer in Hamming space to
the deceptive attractor and those strings that help to maintain the deception will in many cases
reduce the level of deception.” [Whitley, 1991] Further, Liepins and Vose [Liepens and Vose,
1990] showed that a simple translation function exists that can remap an entire binary space.
While a detailed a priori knowledge may be required to perform a remapping in such a way as to
reduce deception, the existence of this possibility clearly weakens the case that the underlying
problem is hard in the global sense we are pursuing.
2
COMPLEXITY AND GA-HARDNESS
Informally, computational complexity is the study of classes of problems based on the rate of
growth of space, time, or other fundamental unit of measure as a function of the size of the input.
A study must be done with clearly defined algorithms, methods, and goals. Therefore, to perform
an analysis of GA-hard problems we begin with some basic assumptions about GAs (taken from
[Rawlins, 1991] ).
A search problem is: given a finite discrete domain D and a function f: D→R, where R is the set of
real numbers, find the best, or near best, value(s) in D under f. f is referred to as the domain
function. An encoding is a function e: Sl→D where S is the alphabet of symbols such that ||S|| ≥ 2
and l ≥ log||S|| ||D||. The encoding is sometimes referred to as the representation. The
composition of functions f and e, is g:Sl→R where g(x)= f(e(x)). A chromosome is a string of
length l in Sl . An abstraction of a typical GA is given in Figure 1.
Generate initial population, G(0);
T=1;
(decode(G(t));
solution is found or termination
Evaluate G(0);
Repeat
Generate G(t) using G(t-1);
t=t+1;
Evaluate
Until
Figure 1. Typical Genetic Algorithm
Figure 1 is the same genetic algorithm than Carol Ankenbrandt [Ankenbrandt, 1991] used to
deduce worst-case and average-case convergence time. The many assumptions used in her proofs
mostly parallel those which Goldberg used to derive the GA Schema Theorem [Goldberg, 1989b].
Ankenbrandt proved that, with proportional selection, GAs have average and worst case time
complexity in O(Evaluate X mlgm/lgr), where m is the population size, r is the fitness ratio, and
Evaluate represents the combined domain dependent evaluation and decoding functions. Though
these are in fact two separate functions, we will refer to them as Ankenbrandt did, as simply the
evaluation function. Notice in particular that the length of the chromosomes and the evaluation
function have dramatic effects on the overall GA complexity. This ignores interesting
possibilities, such as that the evaluation function might be infeasible, or that the decoding function
might be uncomputable.
It should be stressed that these formulae describe the complexity of the GA, rather than that of a
particular problem. More importantly, they do not describe what the GA-complexity of a specific
problem is. They are excellent however, in providing a baseline in the sense that they characterize
how a GA will react to any problem (probabilistically) that meets the general guidelines outlined
in her paper. GAs that do not meet these guidelines because they are in fact slightly different
algorithms may well have slightly different complexities depending on the specific differences.
Nonetheless, they would still be algorithms not problems. Therefore, to provide a provable bound
on a GA-hard problem there must be a bound on the evaluation function as well as a manner by
which the underlying problem can be tied to the representation.
In order to identify the complexity of a problem, we must first identify the model of computation.
Turing machines, random access machines, circuits, probabilistic Turing machines, etc. are wellstudied models of computation [Balcázar et al., 1988]. Depending upon the model of computation
chosen, the complexity of specific problems may vary. For example, primality testing is not
known to be possible in polynomial time on a deterministic Turing machine, but it is possible in
polynomial time on a probabilistic Turing machine. Fortunately, this has not led to a proliferation
of theories, since the strong Church-Turing thesis [Leeuwen, 1990], which states that one can
translate from any reasonable model to any other with at most a polynomial slowdown, has been
proven to hold for all interesting computation models-- with the exception of quantum Turing
machines. Therefore, despite the many possible models of computation, we may provisionally
proceed with an analysis of complexity assuming a deterministic Turing machine model. Further
refinements might require that we introduce a “genetic Turing machine model”, similar to that
introduced by Pudlak [Pudlak, 1994].
2.1 COMPLEXITY CLASSES
In order to discuss computational complexity and classes of problems, some definitions are
required. Since many problems that GAs are used to solve are optimization problems, PO and
NPO are the relevant classical complexity classes. PO and NPO are the optimization equivalents
of P and NP, which are classes of decision problems. Approximation classes such as APX and
PTAS (see Bovet and Crescenzi (Bovet and Crescenzi, 1994) may be even more appropriate, since
typically GAs produce an approximation, but they are harder to work with
Definition 1: The class PO is the class of optimization problems that can be solved in polynomial
time with a Deterministic Turing Machine (DTM). [Bovet and Crescenzi, 1994].
An example of such a problem is:
Minimum Partition:
Given: A = {a1 … an} of non-negative integers such that for all i>j, ai≥aj.
Find: Partition of A into disjoint subsets A1 and A2 which minimizes the maximum of
(!
a"A1
a, !a"A2 a
)
Definition 2: The class NPO is the class of optimization problems that can be solved in
polynomial time with a nondeterministic Turing Machine (NTM).
For example,
Maximum clique (MC):
Given: Graph G
Find: Largest complete sub-graph of G.
The class NPO contains the class PO. This means that problems that can be done in polynomial
time with a DTM can also be done in polynomial time with an NTM.
These two classes are by no means the only interesting or important ones. Nonetheless, they are
sufficient for our purposes, since NPO most likely includes all tractable optimization problems.
That is, these problems are computable in a reasonable amount of time.
Given a complexity class, the classical way to define a “hard” problem is to show that every
problem in the class reduces to the “hard” problem. A reduction R is a mapping from problem A
onto problem B in such a way that one can optimize any problem instance x of A by optimizing
B(R(x)). For example, there is a polynomial time computable transformation from an instance of
Min-partition, which is a list of non-decreasing non-negative integers A, into an instance of MC,
which is a graph G with the interesting property that if one finds the maximum clique in G one can
recover the minimum partition of A. Hardness then, is an upper bound relative to the reduction R,
since one need never do more work to solve any problem in the class than to transform the
instance and solve the instance of the hard problem. If MC were reducible to Min-partition, for
example, this would place an easy (PO) upper bound on a hard (NPO) problem—implying that
MC wasn’t so hard after all. The formal definition of hardness is:
Definition 3: A problem H is Hard for class C if, for any problem A∈C, and any instance x of A,
optimizing R(A(x)) optimizes x.
“Hardness” in this sense is a rigorously defined concept that can only correctly be used in
conjunction with a specified class of problems and a specified reduction. This contrasts with the
usually subjective notion of the term in current GA theory.
2.2 GA-HARDNESS
We now define a formal notion of GA-hardness, using this classical approach. Current instancebased approaches attempt to find problems whose characteristics cause the GA to converge poorly
or not at all (see Marconi and Foster [Marconi and Foster, 1998] for example). However, without
a method for isolating the underlying problem in a GA-specific manner, this sort of research
quickly becomes an analysis of the behavior of representations. This cannot lead to
characterizations of “GA hard” problems, however, for at least two reasons. In the first place, a
practitioner, knowing that a particular instance is hard, may simply change the representation or
the evaluation function. We often use such guidance, for example by applying scaling when the
standard deviation of the solution space, which is instance specific, is particularly flat. Therefore,
without a specific definition to work from and a bound on the evaluation function, complexity
analysis is useless since the model under inspection is a moving target. In the second place, this
focus on problem instances either leaves the definition of “GA hardness” vague, or makes it too
specific. The former approach is the usual one. The latter approach includes criteria such as
specific convergence criteria under a particular representation and fitness function. We need a
formal, but broad definition.
One must be careful, though, that the definition of GA-hardness not become sweeping. A useful
notion will single out problems that are upper bounds on the complexity of some interesting class
of problems, which are difficult to compute with GAs, but which are feasibly solved by non-GA
techniques. For example, equating “GA-hard” with “nonrecursive” problems, or with “needle in
a haystack” problems, would be singularly uninformative. Since our model of feasibility is
polynomial time optimization, and since we are interested in efficient optimizations, this leads to:
Definition 4: Let R be a polynomial time computable, optimality-preserving transformation. A
problem G is GA-hard for class C with respect to R if every problem A in C reduces in polynomial
time to G via R, and G is in PO, and any GA for G requires more than polynomial time to
converge for some instance unless PO=NPO.
This is analogous to saying that primality testing is Deterministic-hard for P, since every problem
in P is reducible to primality testing (every problem in P is reducible to any non-trivial problem)
and there is no deterministic algorithm for primality testing (unless the polynomial hierarchy
collapses), but there is a polynomial time probabilistic algorithm for primality testing.
It is well known, according to the celebrated No Free Lunch theorems of Wolpert and MacReady
[Wolpert and Macready, 1997] that GAs cannot efficiently solve all problems. It is still unknown
whether NFL theorems hold for restricted complexity classes, such as NPO, however, so the above
definition may be vacuous. This would be equivalent to there being no GA-hard problems, as
would happen when GAs could be applied efficiently to any problem given the right
representation. We also have trivial implications, such as if G is GA-hard for NPO, then
PO=NPO. For example, MC is not GA-hard for any interesting class unless PO=NPO. So, GAhardness is most interesting for proper subsets of NPO (unless PO=NPO).
Notice that a problem may be GA-hard for a class C because all transformations produce instances
for which there is no efficiently computable representation. A problem may also be GA-hard
because the transformations produce search spaces in which the evaluation functions are
expensive. In fact, one of these situations must hold, since otherwise Ankenbrandt’s result would
imply that the transformed problem would be solvable in polynomial time on a GA, contradicting
it’s GA-hardness. One can formalize this observation by considering a complexity-bounded
analog of Kolmogorov complexity [Li and Vitanyi, 1990], tailored to genetic algorithms.
3
MINIMUM CHROMOSOME LENGTH
In a GA, all possible chromosomes of a given encoding constitute the search space Sl. As such,
each possible chromosome may be a possible solution to the problem instance at hand. It may be
possible with problem specific knowledge to readily identify some of these chromosomes as not
being potential solutions. For example, in the case of MC, those chromosomes that represent
graphs that are not cliques cannot solve any instance of MC. But over the set of all instances, the
worst case is that every element in a search space is a potential solution. So, in a sense, a single
chromosome is simultaneously the input to the evaluation function, a description of the problem
instance, and a potential solution.
A desideratum for a representation would be to minimize the number of bits in a chromosome that
still uniquely identifies a solution to the problem. In so doing, one minimizes the search space that
must be examined to find an answer. Of course, a good algorithm converges much more quickly
than is required to evaluate the entire search space.
Definition 5: For a problem P, and Dn the set of instances of P of size n, let MCL(P,n) be the least
l for which there is an encoding e:Sl→Dn with a domain dependant evaluation function g, where g
and e are in FP (the class of functions computable in polynomial time).
That is, MCL(P,n) measures the size of the smallest chromosome for which there is a polynomial
time computable representation and evaluation function. Notice that for a fixed problem P,
MCL(P,n) is a function of the input instance size n, and so is comparable to classical complexity
measures. Also, since a smaller representation means a smaller number of solutions to search,
there must be a limit as to how small the representation can become. So MCL(P,n) is well-defined
for every P and n. This is in contrast to the Minimum Description Length Principle (MDLP)
[Rissanen, 1978] which formalizes the notion of Occam’s Razor principle. More specifically, the
MDLP states that the best theory to explain a set of data is the one which minimizes the sum of (1)
the length (encoded in binary bits) of the theory; (2) the length (in binary bits) of the data when
encoded with the help of the theory.
It is important to stress that the domain dependent evaluation function must be feasible.
Otherwise, it would be possible to skew the analysis by having an exponential time or even
uncomputable functions in the main loop of the GA. By bounding this function with a polynomial
we ensure that the evaluation does not materially increase the worst-case analysis in terms of
problem classes. By Ankenbrandt’s theorem, this also implies that the complexity of a GA-hard
problem will be a function of the minimum chromosome size, since otherwise the problem would
be polynomial time solvable on a GA.
Given these definitions, we can now provably describe the complexity of a problem instance as
implemented with a GA. However, since complexity is typically based on the rate of growth of a
fundamental unit of measure as a function of the input size, we have one more step in our analysis.
We must now show how the growth of the problem instance size, n in our definition, correlates
with the MCL.
4
MCL GROWTH
The maximum number of possible solutions that a GA must search is simply 2l, where l is the
number of bits in a chromosome. This is an interesting feature because it means that the
maximum possible number of solutions to examine is governed by the representation. Reducing
the chromosome length by just 1 bit cuts the search space in half. Clearly, the length of a
chromosome is important. However, just as clearly, the chromosome must be uniquely tied to the
problem at hand. Truly, the problem forces the representation to conform to it. We must explore
this area if we are to understand how a hard problem affects a GA.
To explore these issues, consider a chromosome that can be a solution for a GA that solves MC.
Consider the graph G with 5 nodes in Figure 2. We will label the nodes 0 through 4. The maximal
clique in G is the set of nodes {1,2,3}. For an n-node graph, we will use the n-bit representation
that designates subgraph H, where the ith bit of the chromosome is 1 (0) if node i is (is not) in H.
01110 represent the maximum clique in our example. This representation has been used several
times for this problem, for example by Soule and Foster [Soule et al., 1996].
0
4
1
3
2
Figure 2. Example instance of MC, with solution {1,2,3}.
Our representation is an MCL for a GA implementation of Maximum-clique, since any smaller
representation would be unable to represent all 2n subsets of nodes for an n node graph, and each
subset is a maximal clique for some input instance. In fact, MCL(MC,n)=n is easy to verify.
For our example, consider what happens when you add a sixth node to this graph. You must
expand your chromosome to six bits. This means that the MCL grows by one as the size of the
input instance grows by one. This also means that the search space has increased to 26. Clearly,
as the size of the input increases linearly, the MCL grows linearly and the search space grows
exponentially. This is the one hallmark of a hard problem. In the case of MC or any other NP
hard problem, there is no known deterministic algorithm to search this exponentially growing
space in polynomial time.
The MCL growth rate can be used to bound the worst case complexity of the problem for a GA. If
a problem is in NPO, then the MCL growth rate will be no more than linear, since an NP
algorithm can search a space which grows no more quickly than exponentially, and linear MCL
growth implies exponential search space growth. Conversely, if MCL grows slowly enough, then
the search space will be small enough to place the problem in PO. In particular
MCL ( P ,n )
k
Theorem 1: for any problem P, if 2
! O ( n ) for some k, then P∈PO.
It is not clear whether the converse is true. However, it seems that the converse should hold
unless P≠NP. A problem could be in PO, despite having a rapidly growing MCL and therefore a
rapidly growing search space. This would happen if, for any input instance, there were a small
(polynomial in the size of the input) portion of the space that is assured to contain the solution, and
this region could be identified in polynomial time. However, the only way this could happen
would be if any alternative representation, which maps problem instances requires more than
polynomial time. In a sense, the “interesting” part of the search space must be efficiently hidden
(by the representation) in the actual search space in such a way that no efficient algorithm can
discover it. Such a function is a classical one-way function, and it is well known that one-way
functions exist iff P≠NP (actually, iff P≠UP). This is very close to a proof of:
MCL ( P ,n )
k
MCL ( P ,n )
k
Conjecture 1: If P∈PO, and 2
! O ( n ) , then P≠NP.
This is also evidence that if a particular MCL grows linearly, then the underlying problem is most
likely in NPO\PO (unless P=NP). After all, we explicitly disallow unnecessarily inefficient
representations in our definition of MCL. Put another way, even though the encoding e:Sl→Dn
minimizes l, the search space may be larger than the solution space. This happens if some
redundant encoding is forced by the requirement that encoding and evaluation be polynomial time.
In this case, it would clearly be more efficient to avoid algorithms such as GAs that require a
representation. If the solution space for a problem P grows polynomially, while MCL(P,n) grows
super-polynomially, then the P is clearly in PO, but a GA would be a poor choice of methods to
solve P. In other words:
Conjecture 2: If P∈PO, and 2
! O ( n ) , then P is GA-hard for PO (using optimization
preserving polynomial time reductions).
The converse to this would be very interesting, since it would show that GA-hard problems exist
only when P≠NP.
This raises an obvious question: how does one compute MCL(P,n)? We suspect that MCL would
satisfy the formal properties of a Blum complexity measure, if we removed the efficiency
constraints on representation and evaluation, though we have yet to verify this. If true, then there
would be cases where MCL(P,n) was not well defined. However, even given efficiency
constraints such as ours, it is possible that MCL itself is not efficiently computable for a fixed
problem. It is even possible that MCL is not computable at all. So, MCL may be a tool of more
interest in theoretical analysis than in GA practice, which distinguishes it from its clearly related
cousin, the Minimum Description Length Principle.
5
GA COMPLEXITY CLASSES
This discussion leads us to propose a new complexity class specifically for algorithms, such as
GAs, which use a mapping from genotype to phenotype. This class represents the class of GA
problems whose MCL growth rate is linear.
Definition 6: A problem P is in the class NPG if MCL(P,n)∈O(n).
Recall that the worst-case domain dependent evaluation function g, and the worst-case
representation are both polynomial time computable, by definition.
For example, maximum clique (MC) is in NPG, as we saw earlier. Any problem in PO is also in
NPG, since one can use an 1-bit null representation, which one then ignores while solving the
problem directly.
Since MC cannot even be approximated well in polynomial time, let alone be optimized (unless
P=NP) [Håstad, 1996], it would seem that MC cannot be in any reasonable analog of classes of
problems solved efficiently by GAs. Moreover, MC cannot be GA-hard for any class since it has
no efficient non-GA solution (unless P=NP). One way to prove this would be to prove that MC is
complete for NPG, which would require a meaningful reduction in order to show that every other
problem in NPG reduces to MC. We leave this to another day, but suggest that identifying the
relationship between GA-hard problems for some interesting class such as NPO would be a
fruitful strategy for finding NPG complete problems.
6
CONCLUSIONS
We introduced a formal framework for studying the inherent complexity of problems for GAs,
rather than instances of problems. To make the characterization interesting, we limited our
attention to possible problems that are hard for GAs, but easy for some other technique. One
suggestive road for exploring this idea is the growth rate of the best possible representations for a
problem, our MCL. We showed connections between this growth rate and classical computational
complexity classes. We also showed how to use MCL to define a GA specific computational
complexity class, which might make it possible to use classical complexity theoretic strategies,
such as reductions using one-way functions, to answer questions about GAs.
Interestingly, we did not have to introduce a new model of computation for GAs in order to do this
analysis. This makes it more likely that traditional proof strategies may be applicable, and it also
makes it very likely that this theory is applicable to search algorithms other than GAs.
This paper presents several conjectures, and raises many interesting questions which need to be
answered. Rather than enumerating them here, we encourage the interested researcher to find a
question that interests them, then let us know the answer.
Acknowledgements
We thank Ken DeJong for an enlightening discussion of this material over cake and ice cream, and
John Holland for providing the occasion by having a birthday.
References
Ankenbrandt, C.A. (1991). “An Extension To the Theory of Convergence and a Proof of the Time
Complexity of Genetic Algorithms.” Foundations of Genetic Algorithms, pp. 53-68.
Balcàzar, J., Díaz, J., and Gabarró, J. (1988). Structural Complexity Theory I, Springer-Verlag.
Bovet, D., and Crescenzi, P. (1994). Computational Complexity, Prentice Hall.
Goldberg, D.E. (1989b). “Sizing Populations for Serial and Parallel Genetic Algorithms.”
Proceedings of the Third International Conference on Genetic Algorithms, pp. 70-79. San Mateo,
CA: Morgan Kaufman.
Håstad, J. (1996). “Clique is Indeed Hard.” Manuscript, improving his paper from STOC 1996.
Holland, J.H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, MI: University of
Michigan Press.
Li, M., Vitanyi, P. (1990). “Kolmogorov Complexity and its Applications.” Handbook of
Theoretical Computer Science Volume A. Algorithms and Complexity, pp. 189-254. The MIT
Press, Cambridge, Massachusetts.
Leeuwen, J., (1990). Handbook of Theoretical Computer Science Volume A. Algorithms and
Complexity, pp. 119-120.
Liepins, G. And Vose, M. (1990), “Representation Issues in Genetic Optimization.” J.
Experimental and Theoretical Art. Intell. 2(1990) pp. 101-115.
Macready, W. And Wolpert, D. (1997). No Free Lunch Theorems for Optimization.
Marconi, J., Foster, J. (1998) “Finding Cliques in Keller Graphs with Genetic Algorithms.” Proc.
Int. Conf. on Evolutionary Computing.
Pudlak, P. (1994) Proc. Structural Complexity Theory
Rawlins, G.J.E. (1991). Introduction. Foundations of Genetic Algorithms, pp. 1-10.
Soule, Terence, James A. Foster, and John Dickinson. (1996). “Using genetic programming to
approximate maximum cliques.” Proc. Genetic Programming. pp. 400-405.
Whitley, L.D. (1991). “Fundamental Principles of Deception in Genetic Search.” Foundations of
Genetic Algorithms, pp. 221-242.