Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Category - Genetic Algorithms GA-Hard Problems Bart Rylander Initiative for Bioinformatics and Evolutionary Studies (IBEST) Department of Computer Science University of Idaho Moscow, Idaho 83844 [email protected] 360-887-4424 James Foster Initiative for Bioinformatics and Evolutionary Studies (IBEST) Department of Computer Science University of Idaho Moscow, Idaho 83844 [email protected] 208-885-7062 Abstract Genetic algorithm (GA) researchers have long understood the importance of efficiency. Knowing when and where to best use a GA seems a requirement to effective application as well as tantamount to a clear understanding of the underlying theory. Past efforts in this regard largely concentrate on “GA-hardness” or the informal notion of what is difficult for a GA to do. These efforts have produced fruitful results in the analysis of representations. It should not be construed however that features of a representation are necessarily features of a problem. Knowing what features cause a representation to be GA-hard may not imply that the underlying problem is GAhard. It is currently known that not all representations for a given problem exhibit the same degree of epistasis or deception. Further, because previous research has concentrated on representations it has been difficult to achieve even a definition of what a GA-hard problem is, beyond the notional idea of it being something that is difficult for a GA to do. In this paper we define what it means for a problem to be GA-hard. We provide a method for analyzing the GA-complexity of the underlying problem in a way that can be related to classical complexity. We establish how to perform a reduction of one GA problem to another. Finally, we identify future directions for potential research. 1 INTRODUCTION The GA is a biologically inspired search method that seeks to converge to a solution using an evolutionary process. It is typically used when a more direct form of programming cannot be found. By exploiting a method that can adapt, interesting problems may be examined. GAs have successfully been applied to problems from such diverse fields as economics, game theory, genetics, and artificial intelligence, to name a few. At the very inception of the GA John Holland understood the importance of efficiency. He wrote that though enumerative methods guaranteed a solution, they were a false lead. “The flaw, and it is a fatal one, asserts itself when we begin to ask, ‘How long is eventually?’ To get some feeling for the answer we need only look back ... that very restricted system there were 10100 structures ... In most cases of real interest, the number of possible structures vastly exceeds this number ... If 1012 structures could be tried every second (the fastest computers proposed to date could not even add at this rate), it would take a year to test about 3*1019 structures, or a time vastly exceeding the estimated age of the universe to test 10100 structures." [Holland, 1975] Clearly, then, efficiency is crucial. Whether a GA can be applied to a problem and its various instances efficiently is the question that needs to be answered. We need to carefully distinguish between a problem, and a problem instance. We will usually refer to the latter simply as an “instance”. A problem is a mapping from problem instances onto solutions. For example, consider the Maximum Clique (MC) problem (see Definition 2), which is to find the largest complete subgraph H of a given graph G. An instance of MC would be a particular graph G, whereas the problem MC is essentially the set of all pairs (G, H) where H is a maximal clique in G. Our central contention is that most current approaches to “GA hardness” actually explain why a GA is unlikely to converge quickly to a solution on a particular problem instance. Landscape analysis, for example, examines the fitness landscape of a particular problem instance. Holland’s example above only points out the difficulty of searching an immense space of possible solutions for a particular instance. If, however, we are to understand the a priori applicability of GAs to problems, we need a more general approach. Instance based analysis is also very sensitive to representational issues. Most theoretical research in this area examines features of representations that cause GAs to converge more slowly. For example, Whitley showed that the only challenging problems were deceptive [Whitley, 1991]. However, he also realized that one can avoid deception by changing the representation, since “remapping many of the strings so that the “global winner” is moved closer in Hamming space to the deceptive attractor and those strings that help to maintain the deception will in many cases reduce the level of deception.” [Whitley, 1991] Further, Liepins and Vose [Liepens and Vose, 1990] showed that a simple translation function exists that can remap an entire binary space. While a detailed a priori knowledge may be required to perform a remapping in such a way as to reduce deception, the existence of this possibility clearly weakens the case that the underlying problem is hard in the global sense we are pursuing. 2 COMPLEXITY AND GA-HARDNESS Informally, computational complexity is the study of classes of problems based on the rate of growth of space, time, or other fundamental unit of measure as a function of the size of the input. A study must be done with clearly defined algorithms, methods, and goals. Therefore, to perform an analysis of GA-hard problems we begin with some basic assumptions about GAs (taken from [Rawlins, 1991] ). A search problem is: given a finite discrete domain D and a function f: D→R, where R is the set of real numbers, find the best, or near best, value(s) in D under f. f is referred to as the domain function. An encoding is a function e: Sl→D where S is the alphabet of symbols such that ||S|| ≥ 2 and l ≥ log||S|| ||D||. The encoding is sometimes referred to as the representation. The composition of functions f and e, is g:Sl→R where g(x)= f(e(x)). A chromosome is a string of length l in Sl . An abstraction of a typical GA is given in Figure 1. Generate initial population, G(0); T=1; (decode(G(t)); solution is found or termination Evaluate G(0); Repeat Generate G(t) using G(t-1); t=t+1; Evaluate Until Figure 1. Typical Genetic Algorithm Figure 1 is the same genetic algorithm than Carol Ankenbrandt [Ankenbrandt, 1991] used to deduce worst-case and average-case convergence time. The many assumptions used in her proofs mostly parallel those which Goldberg used to derive the GA Schema Theorem [Goldberg, 1989b]. Ankenbrandt proved that, with proportional selection, GAs have average and worst case time complexity in O(Evaluate X mlgm/lgr), where m is the population size, r is the fitness ratio, and Evaluate represents the combined domain dependent evaluation and decoding functions. Though these are in fact two separate functions, we will refer to them as Ankenbrandt did, as simply the evaluation function. Notice in particular that the length of the chromosomes and the evaluation function have dramatic effects on the overall GA complexity. This ignores interesting possibilities, such as that the evaluation function might be infeasible, or that the decoding function might be uncomputable. It should be stressed that these formulae describe the complexity of the GA, rather than that of a particular problem. More importantly, they do not describe what the GA-complexity of a specific problem is. They are excellent however, in providing a baseline in the sense that they characterize how a GA will react to any problem (probabilistically) that meets the general guidelines outlined in her paper. GAs that do not meet these guidelines because they are in fact slightly different algorithms may well have slightly different complexities depending on the specific differences. Nonetheless, they would still be algorithms not problems. Therefore, to provide a provable bound on a GA-hard problem there must be a bound on the evaluation function as well as a manner by which the underlying problem can be tied to the representation. In order to identify the complexity of a problem, we must first identify the model of computation. Turing machines, random access machines, circuits, probabilistic Turing machines, etc. are wellstudied models of computation [Balcázar et al., 1988]. Depending upon the model of computation chosen, the complexity of specific problems may vary. For example, primality testing is not known to be possible in polynomial time on a deterministic Turing machine, but it is possible in polynomial time on a probabilistic Turing machine. Fortunately, this has not led to a proliferation of theories, since the strong Church-Turing thesis [Leeuwen, 1990], which states that one can translate from any reasonable model to any other with at most a polynomial slowdown, has been proven to hold for all interesting computation models-- with the exception of quantum Turing machines. Therefore, despite the many possible models of computation, we may provisionally proceed with an analysis of complexity assuming a deterministic Turing machine model. Further refinements might require that we introduce a “genetic Turing machine model”, similar to that introduced by Pudlak [Pudlak, 1994]. 2.1 COMPLEXITY CLASSES In order to discuss computational complexity and classes of problems, some definitions are required. Since many problems that GAs are used to solve are optimization problems, PO and NPO are the relevant classical complexity classes. PO and NPO are the optimization equivalents of P and NP, which are classes of decision problems. Approximation classes such as APX and PTAS (see Bovet and Crescenzi (Bovet and Crescenzi, 1994) may be even more appropriate, since typically GAs produce an approximation, but they are harder to work with Definition 1: The class PO is the class of optimization problems that can be solved in polynomial time with a Deterministic Turing Machine (DTM). [Bovet and Crescenzi, 1994]. An example of such a problem is: Minimum Partition: Given: A = {a1 … an} of non-negative integers such that for all i>j, ai≥aj. Find: Partition of A into disjoint subsets A1 and A2 which minimizes the maximum of (! a"A1 a, !a"A2 a ) Definition 2: The class NPO is the class of optimization problems that can be solved in polynomial time with a nondeterministic Turing Machine (NTM). For example, Maximum clique (MC): Given: Graph G Find: Largest complete sub-graph of G. The class NPO contains the class PO. This means that problems that can be done in polynomial time with a DTM can also be done in polynomial time with an NTM. These two classes are by no means the only interesting or important ones. Nonetheless, they are sufficient for our purposes, since NPO most likely includes all tractable optimization problems. That is, these problems are computable in a reasonable amount of time. Given a complexity class, the classical way to define a “hard” problem is to show that every problem in the class reduces to the “hard” problem. A reduction R is a mapping from problem A onto problem B in such a way that one can optimize any problem instance x of A by optimizing B(R(x)). For example, there is a polynomial time computable transformation from an instance of Min-partition, which is a list of non-decreasing non-negative integers A, into an instance of MC, which is a graph G with the interesting property that if one finds the maximum clique in G one can recover the minimum partition of A. Hardness then, is an upper bound relative to the reduction R, since one need never do more work to solve any problem in the class than to transform the instance and solve the instance of the hard problem. If MC were reducible to Min-partition, for example, this would place an easy (PO) upper bound on a hard (NPO) problem—implying that MC wasn’t so hard after all. The formal definition of hardness is: Definition 3: A problem H is Hard for class C if, for any problem A∈C, and any instance x of A, optimizing R(A(x)) optimizes x. “Hardness” in this sense is a rigorously defined concept that can only correctly be used in conjunction with a specified class of problems and a specified reduction. This contrasts with the usually subjective notion of the term in current GA theory. 2.2 GA-HARDNESS We now define a formal notion of GA-hardness, using this classical approach. Current instancebased approaches attempt to find problems whose characteristics cause the GA to converge poorly or not at all (see Marconi and Foster [Marconi and Foster, 1998] for example). However, without a method for isolating the underlying problem in a GA-specific manner, this sort of research quickly becomes an analysis of the behavior of representations. This cannot lead to characterizations of “GA hard” problems, however, for at least two reasons. In the first place, a practitioner, knowing that a particular instance is hard, may simply change the representation or the evaluation function. We often use such guidance, for example by applying scaling when the standard deviation of the solution space, which is instance specific, is particularly flat. Therefore, without a specific definition to work from and a bound on the evaluation function, complexity analysis is useless since the model under inspection is a moving target. In the second place, this focus on problem instances either leaves the definition of “GA hardness” vague, or makes it too specific. The former approach is the usual one. The latter approach includes criteria such as specific convergence criteria under a particular representation and fitness function. We need a formal, but broad definition. One must be careful, though, that the definition of GA-hardness not become sweeping. A useful notion will single out problems that are upper bounds on the complexity of some interesting class of problems, which are difficult to compute with GAs, but which are feasibly solved by non-GA techniques. For example, equating “GA-hard” with “nonrecursive” problems, or with “needle in a haystack” problems, would be singularly uninformative. Since our model of feasibility is polynomial time optimization, and since we are interested in efficient optimizations, this leads to: Definition 4: Let R be a polynomial time computable, optimality-preserving transformation. A problem G is GA-hard for class C with respect to R if every problem A in C reduces in polynomial time to G via R, and G is in PO, and any GA for G requires more than polynomial time to converge for some instance unless PO=NPO. This is analogous to saying that primality testing is Deterministic-hard for P, since every problem in P is reducible to primality testing (every problem in P is reducible to any non-trivial problem) and there is no deterministic algorithm for primality testing (unless the polynomial hierarchy collapses), but there is a polynomial time probabilistic algorithm for primality testing. It is well known, according to the celebrated No Free Lunch theorems of Wolpert and MacReady [Wolpert and Macready, 1997] that GAs cannot efficiently solve all problems. It is still unknown whether NFL theorems hold for restricted complexity classes, such as NPO, however, so the above definition may be vacuous. This would be equivalent to there being no GA-hard problems, as would happen when GAs could be applied efficiently to any problem given the right representation. We also have trivial implications, such as if G is GA-hard for NPO, then PO=NPO. For example, MC is not GA-hard for any interesting class unless PO=NPO. So, GAhardness is most interesting for proper subsets of NPO (unless PO=NPO). Notice that a problem may be GA-hard for a class C because all transformations produce instances for which there is no efficiently computable representation. A problem may also be GA-hard because the transformations produce search spaces in which the evaluation functions are expensive. In fact, one of these situations must hold, since otherwise Ankenbrandt’s result would imply that the transformed problem would be solvable in polynomial time on a GA, contradicting it’s GA-hardness. One can formalize this observation by considering a complexity-bounded analog of Kolmogorov complexity [Li and Vitanyi, 1990], tailored to genetic algorithms. 3 MINIMUM CHROMOSOME LENGTH In a GA, all possible chromosomes of a given encoding constitute the search space Sl. As such, each possible chromosome may be a possible solution to the problem instance at hand. It may be possible with problem specific knowledge to readily identify some of these chromosomes as not being potential solutions. For example, in the case of MC, those chromosomes that represent graphs that are not cliques cannot solve any instance of MC. But over the set of all instances, the worst case is that every element in a search space is a potential solution. So, in a sense, a single chromosome is simultaneously the input to the evaluation function, a description of the problem instance, and a potential solution. A desideratum for a representation would be to minimize the number of bits in a chromosome that still uniquely identifies a solution to the problem. In so doing, one minimizes the search space that must be examined to find an answer. Of course, a good algorithm converges much more quickly than is required to evaluate the entire search space. Definition 5: For a problem P, and Dn the set of instances of P of size n, let MCL(P,n) be the least l for which there is an encoding e:Sl→Dn with a domain dependant evaluation function g, where g and e are in FP (the class of functions computable in polynomial time). That is, MCL(P,n) measures the size of the smallest chromosome for which there is a polynomial time computable representation and evaluation function. Notice that for a fixed problem P, MCL(P,n) is a function of the input instance size n, and so is comparable to classical complexity measures. Also, since a smaller representation means a smaller number of solutions to search, there must be a limit as to how small the representation can become. So MCL(P,n) is well-defined for every P and n. This is in contrast to the Minimum Description Length Principle (MDLP) [Rissanen, 1978] which formalizes the notion of Occam’s Razor principle. More specifically, the MDLP states that the best theory to explain a set of data is the one which minimizes the sum of (1) the length (encoded in binary bits) of the theory; (2) the length (in binary bits) of the data when encoded with the help of the theory. It is important to stress that the domain dependent evaluation function must be feasible. Otherwise, it would be possible to skew the analysis by having an exponential time or even uncomputable functions in the main loop of the GA. By bounding this function with a polynomial we ensure that the evaluation does not materially increase the worst-case analysis in terms of problem classes. By Ankenbrandt’s theorem, this also implies that the complexity of a GA-hard problem will be a function of the minimum chromosome size, since otherwise the problem would be polynomial time solvable on a GA. Given these definitions, we can now provably describe the complexity of a problem instance as implemented with a GA. However, since complexity is typically based on the rate of growth of a fundamental unit of measure as a function of the input size, we have one more step in our analysis. We must now show how the growth of the problem instance size, n in our definition, correlates with the MCL. 4 MCL GROWTH The maximum number of possible solutions that a GA must search is simply 2l, where l is the number of bits in a chromosome. This is an interesting feature because it means that the maximum possible number of solutions to examine is governed by the representation. Reducing the chromosome length by just 1 bit cuts the search space in half. Clearly, the length of a chromosome is important. However, just as clearly, the chromosome must be uniquely tied to the problem at hand. Truly, the problem forces the representation to conform to it. We must explore this area if we are to understand how a hard problem affects a GA. To explore these issues, consider a chromosome that can be a solution for a GA that solves MC. Consider the graph G with 5 nodes in Figure 2. We will label the nodes 0 through 4. The maximal clique in G is the set of nodes {1,2,3}. For an n-node graph, we will use the n-bit representation that designates subgraph H, where the ith bit of the chromosome is 1 (0) if node i is (is not) in H. 01110 represent the maximum clique in our example. This representation has been used several times for this problem, for example by Soule and Foster [Soule et al., 1996]. 0 4 1 3 2 Figure 2. Example instance of MC, with solution {1,2,3}. Our representation is an MCL for a GA implementation of Maximum-clique, since any smaller representation would be unable to represent all 2n subsets of nodes for an n node graph, and each subset is a maximal clique for some input instance. In fact, MCL(MC,n)=n is easy to verify. For our example, consider what happens when you add a sixth node to this graph. You must expand your chromosome to six bits. This means that the MCL grows by one as the size of the input instance grows by one. This also means that the search space has increased to 26. Clearly, as the size of the input increases linearly, the MCL grows linearly and the search space grows exponentially. This is the one hallmark of a hard problem. In the case of MC or any other NP hard problem, there is no known deterministic algorithm to search this exponentially growing space in polynomial time. The MCL growth rate can be used to bound the worst case complexity of the problem for a GA. If a problem is in NPO, then the MCL growth rate will be no more than linear, since an NP algorithm can search a space which grows no more quickly than exponentially, and linear MCL growth implies exponential search space growth. Conversely, if MCL grows slowly enough, then the search space will be small enough to place the problem in PO. In particular MCL ( P ,n ) k Theorem 1: for any problem P, if 2 ! O ( n ) for some k, then P∈PO. It is not clear whether the converse is true. However, it seems that the converse should hold unless P≠NP. A problem could be in PO, despite having a rapidly growing MCL and therefore a rapidly growing search space. This would happen if, for any input instance, there were a small (polynomial in the size of the input) portion of the space that is assured to contain the solution, and this region could be identified in polynomial time. However, the only way this could happen would be if any alternative representation, which maps problem instances requires more than polynomial time. In a sense, the “interesting” part of the search space must be efficiently hidden (by the representation) in the actual search space in such a way that no efficient algorithm can discover it. Such a function is a classical one-way function, and it is well known that one-way functions exist iff P≠NP (actually, iff P≠UP). This is very close to a proof of: MCL ( P ,n ) k MCL ( P ,n ) k Conjecture 1: If P∈PO, and 2 ! O ( n ) , then P≠NP. This is also evidence that if a particular MCL grows linearly, then the underlying problem is most likely in NPO\PO (unless P=NP). After all, we explicitly disallow unnecessarily inefficient representations in our definition of MCL. Put another way, even though the encoding e:Sl→Dn minimizes l, the search space may be larger than the solution space. This happens if some redundant encoding is forced by the requirement that encoding and evaluation be polynomial time. In this case, it would clearly be more efficient to avoid algorithms such as GAs that require a representation. If the solution space for a problem P grows polynomially, while MCL(P,n) grows super-polynomially, then the P is clearly in PO, but a GA would be a poor choice of methods to solve P. In other words: Conjecture 2: If P∈PO, and 2 ! O ( n ) , then P is GA-hard for PO (using optimization preserving polynomial time reductions). The converse to this would be very interesting, since it would show that GA-hard problems exist only when P≠NP. This raises an obvious question: how does one compute MCL(P,n)? We suspect that MCL would satisfy the formal properties of a Blum complexity measure, if we removed the efficiency constraints on representation and evaluation, though we have yet to verify this. If true, then there would be cases where MCL(P,n) was not well defined. However, even given efficiency constraints such as ours, it is possible that MCL itself is not efficiently computable for a fixed problem. It is even possible that MCL is not computable at all. So, MCL may be a tool of more interest in theoretical analysis than in GA practice, which distinguishes it from its clearly related cousin, the Minimum Description Length Principle. 5 GA COMPLEXITY CLASSES This discussion leads us to propose a new complexity class specifically for algorithms, such as GAs, which use a mapping from genotype to phenotype. This class represents the class of GA problems whose MCL growth rate is linear. Definition 6: A problem P is in the class NPG if MCL(P,n)∈O(n). Recall that the worst-case domain dependent evaluation function g, and the worst-case representation are both polynomial time computable, by definition. For example, maximum clique (MC) is in NPG, as we saw earlier. Any problem in PO is also in NPG, since one can use an 1-bit null representation, which one then ignores while solving the problem directly. Since MC cannot even be approximated well in polynomial time, let alone be optimized (unless P=NP) [Håstad, 1996], it would seem that MC cannot be in any reasonable analog of classes of problems solved efficiently by GAs. Moreover, MC cannot be GA-hard for any class since it has no efficient non-GA solution (unless P=NP). One way to prove this would be to prove that MC is complete for NPG, which would require a meaningful reduction in order to show that every other problem in NPG reduces to MC. We leave this to another day, but suggest that identifying the relationship between GA-hard problems for some interesting class such as NPO would be a fruitful strategy for finding NPG complete problems. 6 CONCLUSIONS We introduced a formal framework for studying the inherent complexity of problems for GAs, rather than instances of problems. To make the characterization interesting, we limited our attention to possible problems that are hard for GAs, but easy for some other technique. One suggestive road for exploring this idea is the growth rate of the best possible representations for a problem, our MCL. We showed connections between this growth rate and classical computational complexity classes. We also showed how to use MCL to define a GA specific computational complexity class, which might make it possible to use classical complexity theoretic strategies, such as reductions using one-way functions, to answer questions about GAs. Interestingly, we did not have to introduce a new model of computation for GAs in order to do this analysis. This makes it more likely that traditional proof strategies may be applicable, and it also makes it very likely that this theory is applicable to search algorithms other than GAs. This paper presents several conjectures, and raises many interesting questions which need to be answered. Rather than enumerating them here, we encourage the interested researcher to find a question that interests them, then let us know the answer. Acknowledgements We thank Ken DeJong for an enlightening discussion of this material over cake and ice cream, and John Holland for providing the occasion by having a birthday. References Ankenbrandt, C.A. (1991). “An Extension To the Theory of Convergence and a Proof of the Time Complexity of Genetic Algorithms.” Foundations of Genetic Algorithms, pp. 53-68. Balcàzar, J., Díaz, J., and Gabarró, J. (1988). Structural Complexity Theory I, Springer-Verlag. Bovet, D., and Crescenzi, P. (1994). Computational Complexity, Prentice Hall. Goldberg, D.E. (1989b). “Sizing Populations for Serial and Parallel Genetic Algorithms.” Proceedings of the Third International Conference on Genetic Algorithms, pp. 70-79. San Mateo, CA: Morgan Kaufman. Håstad, J. (1996). “Clique is Indeed Hard.” Manuscript, improving his paper from STOC 1996. Holland, J.H. (1975). Adaptation in Natural and Artificial Systems. Ann Arbor, MI: University of Michigan Press. Li, M., Vitanyi, P. (1990). “Kolmogorov Complexity and its Applications.” Handbook of Theoretical Computer Science Volume A. Algorithms and Complexity, pp. 189-254. The MIT Press, Cambridge, Massachusetts. Leeuwen, J., (1990). Handbook of Theoretical Computer Science Volume A. Algorithms and Complexity, pp. 119-120. Liepins, G. And Vose, M. (1990), “Representation Issues in Genetic Optimization.” J. Experimental and Theoretical Art. Intell. 2(1990) pp. 101-115. Macready, W. And Wolpert, D. (1997). No Free Lunch Theorems for Optimization. Marconi, J., Foster, J. (1998) “Finding Cliques in Keller Graphs with Genetic Algorithms.” Proc. Int. Conf. on Evolutionary Computing. Pudlak, P. (1994) Proc. Structural Complexity Theory Rawlins, G.J.E. (1991). Introduction. Foundations of Genetic Algorithms, pp. 1-10. Soule, Terence, James A. Foster, and John Dickinson. (1996). “Using genetic programming to approximate maximum cliques.” Proc. Genetic Programming. pp. 400-405. Whitley, L.D. (1991). “Fundamental Principles of Deception in Genetic Search.” Foundations of Genetic Algorithms, pp. 221-242.