* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Ch 12
Genetic testing wikipedia , lookup
Ridge (biology) wikipedia , lookup
Hybrid (biology) wikipedia , lookup
Neocentromere wikipedia , lookup
Skewed X-inactivation wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Behavioural genetics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Minimal genome wikipedia , lookup
Gene expression profiling wikipedia , lookup
Y chromosome wikipedia , lookup
Frameshift mutation wikipedia , lookup
Human genetic variation wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Genome evolution wikipedia , lookup
Heritability of IQ wikipedia , lookup
History of genetic engineering wikipedia , lookup
X-inactivation wikipedia , lookup
Genetic drift wikipedia , lookup
Koinophilia wikipedia , lookup
Designer baby wikipedia , lookup
Point mutation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome (book) wikipedia , lookup
Population genetics wikipedia , lookup
12. GENETIC ALGORITHMS FOR SOLUTION OF NONLINEAR OPTIMIZATION PROBLEMS 12.1 BACKGROUND There are many types of water resources problems that are intractable with respect to classical optimization approaches. For example, consider a groundwater optimization problem where it is desired to determine the minimum pumping cost necessary to produce a desired sequence of outflows through time. If: the aquifer is large, there will be many computational nodes (using standard aquifer modeling methods, such as finite difference or finite element techniques) the problem is time-varying, then the modeling approach must step through time, thereby increasing the problem size even more the aquifer is unconfined, the governing hydrodynamic equations will be nonlinear, as will be their finite element or finite difference representation If such a problem were to be modeled using the approaches discussed in the previous section, the resulting model could literally have millions of simultaneous nonlinear constraints. It would not be possible to solve such a problem, even with the powerful computer hardware and software that has become so readily available. Problems that are so intractable--because of their dimensionality and/or nonlinearity--are quite common in water resources engineering (e.g., groundwater optimization such as the above, optimal operation of multiple reservoir systems in large river basins, multiple constituent water quality management, etc.). This has led to the development and application of non-traditional optimization methods that are robust, but not based on classical mathematical approaches, such as LP or gradient methods. Genetic algorithms (GA) represent one such set of robust methods. 12.2 GENETIC ALGORITHMS AND GENETIC PROGRAMMING Genetic programming (GP) is a relatively new branch of operations research. The reader is referred to the following on-line materials for a more exhaustive treatment of the subject: an Introduction to Genetic Algorithms with Java Applets (http://cs.felk.cvut.cz/~xobitko/ga/) the official GA FAQ (http://www.cs.cmu.edu/Groups/AI/html/faqs/ai/genetic/top.html) the GA Archives (http://www.aic.nrl.navy.mil/galist/) The Hitch-Hiker’s Guide to Evolutionary Computation (http://www.cs.purdue.edu/coast/archive/clife/FAW/www/) The Genetic Programming Tutorial Notebook 133 (http://www.mysite.com/jjf/gp/Tutorial/tutorial.html) GP has been used in water resources engineering only in recent years. It is robust and computationally efficient for many types of problems, especially those that are highly nonlinear. Darwin’s Theory of Evolution and the basic genetic operations of sexual reproduction have inspired it. As a result, much of the terminology used in GP/GA is derived from these origins in biology. 12.2.1 Biological Background The physical characteristics of an individual are determined by its genetic make-up. The set of physical characteristics of an organism is called the organism’s “phenotype”. The genetic makeup of an organism is called its “genotype”. Genetic material is encoded in genes, which are arrayed together to form chromosomes. The notion of genes combining to form chromosomes, which collectively represent an organism’s genotype, which in turn sets the organism’s physical characteristics--its phenotype--is illustrated in a very simplistic fashion in Figure 12.1. Genotype (genetic make-up) Phenotype (physical characteristics) blue eyes gene for eye color brown hair gene for hair color { ... chromosome (a collection of genes) Figure 12.1: A Simple Representation of the Relationship between an Organism’s Genotype and Phenotype In nature (according to Darwin’s Theory of Evolution), the environment acts upon an individual’s physical characteristics and determines: 134 the individual’s suitability for survival the individual’s likelihood for reproductive success In general, those individuals that are most suited for survival in the environment in which they live will produce the greatest number of offspring. Thus: they will pass more of their genetic material to subsequent generations than other, less fit individuals their offspring will be better suited for survival than the offspring of other, less fit individuals This process of more fit individuals passing their genetic material to a greater number of offspring than less fit individuals is known as “survival of the fittest”. GP is a branch of operations research whereby these biological processes are used as a set of principles for constructing optimization algorithms; these algorithms generate “populations” of decisions or management policies that become more and more fit with each succeeding generation. The following sections illustrate how this is done. 12.2.2 Relationship of GP to Optimization The process of biological evolution is one wherein, with each succeeding generation, individuals are produced that are, on average, better fit for the environment in which they live. As such, it is a type of optimization process which, in a sense, creates “better” individuals with each iteration. Presumably, then, given enough iterations (i.e., with enough generations of a population living in an environment of interest) and a way of measuring the quality or desirability of an organism, an individual would eventually be produced having physical characteristics that, in total, would be in the neighborhood of an “optimal solution”. This connection between genetics and evolution on the one hand and optimization on the other begins to become more obvious when considering the relationship in the terminology they use. For example, rough synonyms for terms in GP and other, more familiar traditional terms in operations research are given in Table 12.1. 12.2.3 A Basic Genetic Algorithm The basic design for genetic algorithms is actually quite simple and easy to code into a computer program. A GA normally consists of the following steps: Step 1: Population Generation: A population of n chromosomes (i.e., individuals) is generated by randomly selecting values for the genes in the chromosomes. (I.e., randomly assign values to the decision variables for each of a large number of alternatives.) Table 12.1: Terminology Used in GA and Corresponding Synonyms from Systems Analysis Terminology Used in GA Synonymous Concepts from Systems Analysis 135 gene chromosome = collection of genes = “organism” = “individual” population = a set of chromosomes or individuals fitness the value of a decision variable an array of decision variables a set of solutions (a collection of n-tuples, each of which specifies a different set of values of decision variables) objective function value Step 2: Fitness Evaluation: Evaluate the “fitness” of each chromosome in the population. (I.e., calculate the value of the objective function for each alternative.) Step 3: Test for Completion: Test to see if an end condition has been achieved (e.g., test to see if a maximum number of generations has been reached, etc.). If so, stop. If not, continue with the next step. Step 4: Create a New Population: Apply the processes of selection, crossover, mutation, and replacement to build a new population. Step 4a: Selection: Select two parent chromosomes from the present population according to their fitness: the greater the fitness of an individual, the greater is the chance that the individual will be selected to be a parent and produce offspring. (I.e., select two alternatives from the current collection of alternatives, and base that selection upon the value of the objective function of the current alternatives.) Step 4b: Crossover: With a pre-selected probability, select genes from one parent or the other to form a new individual (i.e., to form an offspring). (I.e., use some of the decision variable values from one of the alternatives, and some from the other, to formulate a new alternative.) Step 4c: Mutation: With a pre-selected probability, cause a mutation to happen at any given gene in the new individual (i.e., make a small change in the value of a randomly selected decision variable). (I.e., make small, random changes in the values of some of the decision variables of the new alternative.) Step 4d: Replacement: Repeat Steps 3a through 3c until n new chromosomes have been constructed. Replace the old population of chromosomes with the new ones. (I.e., repeat the processes outlined in Steps 3a through 3c until a complete set of new alternatives has been formulated. Replace the old set of alternatives with this new one.) Step 5: Repetition: Repeat the process with the new population, starting at Step 2. 12.2.4 Illustration of Key Steps 136 The following sections illustrate the above concepts in ways that might be more familiar to water resources engineers. Encoding of a Chromosome: Information about the genetic make-up of an individual is encoded into a chromosome. There are several different ways of encoding such information, depending on the type of problem of interest. The major encoding methods are binary encoding (where genes take on values of either 0 or 1), permutation encoding (where genes have integer values), value encoding (where genes take on real-valued numbers), and tree encoding (where genes are actually “objects”, such as commands in a programming language; tree encoding is used in GA to “evolve” computer programs). These are illustrated in Figure 12.2. The encoding method of greatest utility to water resources problems is value encoding. 1 0 0 1 0 1 1 0 1 0 0 ... "Binary Encoding": every chromosome is a string of bits (i.e., either 0 or 1). 6 1 4 9 3 0 5 2 7 8 1 ... "Permuatation Encoding": every chromosome is a string of numbers, each of which is a number in a sequence. "Value Encoding": every chromosome is a sequence of values. 3.145 6.259 1.476 2.847 ... + "Tree Encoding": every chromosome is a tree of objects, such as functions or commands in a programming language. Enter * 5 % Figure 12.2: Alternative Chromosome Encoding Methods Fitness Evaluation: Evaluation of the fitness of an individual simply corresponds to determining a scalar-valued expression by examining the genes in the individual’s chromosome. This is the same as calculating the value of the objective function, given values for the decision variables of a problem, and is illustrated for a water resources problem in Figure 12.3. 137 "Chromosome" = set of decision variable values, e.g.: } } ... } "Fitness" is evaluated for an individual by putting the values of the "genes" into a simulation model to see, for example: Reservoir operating rule parameter values • how the water resources system will behave Pumping rates on wells • what will be the total costs and benefits that result others e.g., "fitness" = objective function value = benefits - costs Figure 12.3: Evaluation of “Fitness” for a Water Resources Problem Crossover: Crossover is the process whereby, with a pre-specified probability, genes from one parent or the other are used to form a new individual (the “offspring”). Crossover is a random process involving the identification of a “crossover point” on the chromosome. It works by first selecting (at random) one or more points on the chromosome where “crossover” will occur. All genes in the offspring chromosome up to that point will be taken from one parent, while all genes after that point will be taken from the other. Typically, multiple crossover points are selected, with genes being taken first from one parent until a crossover point is encountered, and then from the other parent until the next crossover point is found (as illustrated in Figure 12.4). For water resources problems where value encoding is used, crossover consists of simply copying genes from first one parent and then the other, alternating between parents as crossover points are found. This is illustrated in Figure 12.5. 138 Parent A Parent B Genes are selected at random from the parents and copied to the offspring. Offspring Figure 12.4: The Process of Crossover to Produce a New Offspring Parent A: (3.712) (4.681) (0.973) (7.818) (6.579) cross over point (randomly selected) Parent B: (8.040) (6.721) (5.619) (0.011) (2.038) Offspring: (3.712) (4.681) (0.973) (0.011) (2.038) Figure 12.5: Value Encoding Crossover 139 Mutation: The process of mutation is used to continually introduce new values of genes into the population (i.e., to continually modify the values of decision variables in alternatives). This ensures that the current best solution will not stagnate, but will continue to improve with each new generation. Mutation is a random, but controlled process. After a new offspring is created, each gene in the offspring is examined. Most genes are left unchanged, but the value of some genes will be modified. This will happen at random and only with a pre-specified frequency (illustrated in Figure 12.6). Offspring before mutation: Offspring after mutation: mutation mutation Figure 12.6: The Process of Mutation For GA problems involving value encoding, the mutation process must modify the value of mutated genes by only a small amount (see Figure 12.7). Many simple algorithms are available for ensuring that the mutations are not so large (or numerous) to cause the resulting individual to be “deformed”. Mutations should represent slight perturbations, not catastrophic ones. For example, the rate at which genes mutate is typically selected to be quite low (say, on the order of less than one percent). If a value encoding gene is to be mutated, the direction of the change might be selected at random (essentially from a coin flip), but the magnitude of the mutation is typically kept low (say, a random percent of some maximum amount for a given gene). Prespecification of the values of the parameters that control mutation rates and amounts is part of the art of successful genetic programming. 140 Chromosome Before Mutation Chromosome After Mutation 1.29 no mutation 1.29 5.68 no mutation 5.68 2.86 mutation 2.73 4.11 mutation 4.22 5.55 no mutation 5.55 ... ... Figure 12.7: Value Encoding Mutation Selection of Parent Chromosomes: Various methods have been proposed to select the chromosomes of one population that will become the parents of chromosomes of the following population. All are based on some reference to the fitness of individual chromosomes. A common method of parent selection is called “roulette wheel” selection, which picks individuals from the present population at random. Let the probability of individual i being selected to become a parent be prob(Pi), define the fitness if individual i to be fi, and assume there are n individuals in the population. The probability of individual i being selected, then, is: prob(Pi) = individual fitness sum of fitness of all individuals = fi n fj ...[12.1] j=1 Selection with Elitism: When creating a new population of n individuals, it is often advantageous to retain a few of the best individuals from the previous population. Doing so is called “elitism”. It works by retaining the best m individuals from the old population, and, in a population with n individuals, only replacing only n - m of them with new offspring. This prevents the loss of the best solutions that have been found. 12.2.5 Parameters of GA The most important parameters in a genetic algorithm are: The population size, n: The population size controls the quantity of genetic material that is being examined at any one time in search of an optimal solution. The greater the population size, the greater will be the amount of genetic material under consideration, and, hence, the 141 greater will be the likelihood that better solutions will be discovered in the next generation. However, as the population size increases the amount of computer resources and time required to evaluate population fitness also increases. The crossover probability: The crossover probability governs how often crossover will take place in combining the genes of both parents to produce an offspring. Crossover that is too frequent might separate genes that are close together on the chromosome of one of the parents, but which act in combination in a favorable manner. Crossover that is not frequent enough runs the risk of producing an offspring that is too similar to one or the other of the parents. The mutation probability: The mutation probability governs how often genes will be mutated. Mutations prevent the algorithm from converging to a local optimum, but mutations that are too frequent might cause the algorithm to wander and thereby slow the rate of convergence to a global optimum. 12.3 EXAMPLE GA PROBLEM Consider the following simple nonlinear optimization problem: Max Z = f(x,y) = 1000 - [(x - 1)2 + (y- 1)2] s.t.: ...[12.2] -10 ≤ x ≤ 10 ...[12.3] -10 ≤ y ≤ 10 ...[12.4] This problem was solved with MacsGA (refer to Appendix 2) using the GA parameter values shown in Table 12.2. Figure 12.8 shows a plot of the maximum, mean, and minimum fitness (objective function) values of the population versus generation. Note that for a population size of 100, the initial maximum fitness is very high. Also note that even the mean population fitness rapidly converges to the optimal solution (i.e., Z = 1000), but that the minimum fitness per generation tends to wander as a result of the random selection and mutation processes built into the GA code. Table 12.2: Genetic Algorithm Parameters Used in Solving Example Problem Parameter Value Probability of Crossover 0.5 Probability of Mutation 0.05 Population Size 100 Number of Generations 100 Number of Elite Individuals 5 142 Example of GA Converge nce 1050 1000 Fitness 950 Minimum Mean Maximum 900 850 800 0 25 50 75 Generation 100 (a) Minimum, Mean, and Maximum Fitness for 100 Generations Maximum Fitness Curve 1000.1 1000.0 Fitn ess 999.9 999.8 999.7 999.6 999.5 999.4 0 5 10 15 Gen eration 20 25 (b) Maximum Fitness for 30 Generations Figure 12.8: GA Convergence in Solution of a Simple Quadratic Problem 143 30 12.4 PROBLEMS 1. Evaluate the sensitivity of the rate of convergence of the problem presented in Section 12.3 to different GA parameter values (i.e., probability of crossover, mutation probability, etc.). Plot and discuss your results. 2. Solve Problem 2 from Section 11 using a genetic algorithm. 144