* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 10: Learning - Genetic algorithms
Survey
Document related concepts
Neocentromere wikipedia , lookup
Frameshift mutation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Point mutation wikipedia , lookup
Genetic drift wikipedia , lookup
Group selection wikipedia , lookup
Gene expression programming wikipedia , lookup
Koinophilia wikipedia , lookup
Microevolution wikipedia , lookup
Transcript
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002 Imitating Nature Aspect of the evolution of organisms: • The organisms that are ill-suited for an environment have little chances to reproduce (natural selection) • Conversely, the best fitting have more chances to survive and reproduce Imitating Nature Reproduction: • Offspring are similar to their parents • Random mutations occur and they can bring to better (or worse) fitting individuals “The Origin of the Species on the Basis of Natural Selection” C. Darwin (1859) Encoding: • An organism is fully represented by its DNA string, that is a string over a finite alphabet (4 symbols) • Each element of this string is called gene Genetic Algorithm (GA) • Developed by John Holland in the early 70’s • Optimization and machine learning techniques inspired from the process of natural evolution and evolutionary genetics – Solutions are encoded as chromosomes – Search proceeds through maintenance of a population of solutions – Reproduction favors “better” chromosomes – New chromosomes are generated during reproduction through processes of mutation and cross over, etc. GA Framework selection Search space A 0 1 0 0 0 B 1 0 1 1 0 C 1 1 0 1 0 D 0 1 0 1 1 population cross over 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 1 1 1 0 mutation 1 0 0 1 1 Fitness evaluation 0 1 1 1 0 reproduction GA Procedure • Start with a population of N individuals 1. Apply the fitness function to all the individuals 2. Select the pairs of individuals for reproduction (repetition allowed). 3. Each pair generates two children (reproduction with cross-over) 4. Apply a random mutation to the children. The children become the next generation 5. Apply steps 1,2,3 until some termination criteria applies Encoding Scheme • An individual (an organisms) is intended to be a possible solution for the problem you want to solve • An individual is represented by a binary string. Such a string is intended to be the complete description of the individual • Example: Suppose you have to find a number between 0 and 255, which binary representation contains the same number of 1s and 0s. A individual is a string of 8 bits, ex: h= 0 1 1 1 1 1 1 0 = 126 Fitness Function • A fitness function is a function that says how good is a solution, i.e. how well an individual fit the environment • Example f (h) 8 | n1 n0 | note that the fitness function gets the minimum value (i.e. 0) when n1 8 or n0 8 and the maximum value (i.e. 8) when n1 n2 4 The Initial Population 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 Optimization • local optimum 방지 cf. Hill-climbing Method GA Search Method Selection • Roulette wheel selection – compute each individual’s contribution to the global fitness as – The choice of the pairs for reproduction consists of randomly choosing the individuals (with replacement) with distribution given by P encoding A B C D 0 1 11 1 1 1 0 1 1 11 1 1 1 0 0 0 10 0 1 0 0 0 0 00 0 0 0 1 fitness 4 2 4 2 P(-) .33 .17 .33 .17 D 17% C 33% A 33% B 17% Roulette Wheel Crossover – Randomly choose a cross over point “c”, i.e. a number between 1 and n – return two children: one composed by the first c bits of the first parent and the last n-c bits of the second parent, the other composed by the first c bits of the second parent and the n-c bits of the first parents 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 1 0 c 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 Mutation • mutation on individuals: some of the children’s bits are changed (with a small, independent probability 0 0 11 1 1 1 0 0 0 11 0 1 1 0 0 1 1 00 1 0 0 f 8 | 3 5 | 6 0 0 11 0 1 1 0 f 8 | 4 4 | 8 0 0 10 0 1 1 0 f 8 | 3 7 | 4 1 0 11 1 1 0 0 f 8 | 5 3 | 6 maximum found Stopping Criteria • Convergence: – A population is said to converge when all the genes have converged, I.e. when the value of every bit is the same at least in the 95% of the individuals in the population • Since convergence is not guaranteed, we must consider other stopping criteria: – Number of generations – Almost constant value of the best fitting individual – Almost constant value of the average fitness of the population Parameter Settings • Population size – How many chromosomes are in population • Too few chromosome small part of search space • Too many chromosome GA slow down – Recommendation : 20-30, 50-100 • Probability of crossover – How often will be crossover performed – Recommendation : 80% -95% • Probability of mutation – How often will be parts of chromosome mutated – Recommendation : 0.5% - 1% Genetic Programming • One of the central challenges of CS is to get a computer to do what needs to be done, without telling it how to do it – Automatic programming (or program synthesis) • GP is a branch of genetic algorithms • Main difference between GP and GA – Representation of the solution (computer program) • GA: a string of numbers – fixed-length character strings • GP: computer program (lisp or scheme) – Represent hierarchical computer programs of dynamically varying sizes and shapes