* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genetic Algorithms
Genetic testing wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Behavioural genetics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Y chromosome wikipedia , lookup
Genetic engineering wikipedia , lookup
Neocentromere wikipedia , lookup
Heritability of IQ wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
History of genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
X-inactivation wikipedia , lookup
Genome evolution wikipedia , lookup
Human genetic variation wikipedia , lookup
Designer baby wikipedia , lookup
Koinophilia wikipedia , lookup
Genetic drift wikipedia , lookup
Genome (book) wikipedia , lookup
Population genetics wikipedia , lookup
Genetic Algorithms In part from: https://www.cs.wmich.edu/~elise/courses/cs6800/GeneticAlgorithms.ppt Introduction • After scientists became disillusioned with classical and neo-classical attempts at modeling intelligence, they looked in other directions. • Two prominent fields arose, connectionism (neural networking, parallel processing) and evolutionary computing. • genetic algorithms are a form of evolutionary computing • They belong to the category of untrained reinforcement-based problem solving algorithms What is GA • A genetic algorithm (or GA) is a search technique used in computing to find true or approximate solutions to optimization and search problems. • Genetic algorithms are categorized as global search heuristics. • Genetic algorithms are a particular class of evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover (also called recombination). What is GA • Genetic algorithms are implemented as a computer simulation in which a population of abstract representations (called chromosomes or the genotype or the genome) of candidate solutions (called individuals, creatures, or phenotypes) to an optimization problem evolves toward better solutions. • In other words: An iterative process during which an optimization problem is “encouraged” to produce better solutions at each iteration • Traditionally, solutions are represented as binary strings of 0s and 1s, but other encodings are also possible. What is GA • The evolution usually starts from a population of randomly generated individuals and happens in generations. • In each generation, the fitness (=evaluation function) of every individual (=solution) in the population is evaluated, multiple individuals are selected from the current population (based on their fitness), and modified (recombined and possibly mutated) to form a new population. What is GA • The new population is then used in the next iteration of the algorithm. • Commonly, the algorithm terminates when either a maximum number of generations has been produced, or a satisfactory fitness level has been reached for the population. • If the algorithm has terminated due to a maximum number of generations, a satisfactory solution may or may not have been reached. GA terminology summary • Individual - Any possible solution (e.g. best strategy π for a robot to perform a given task, or a neural network architecture with a given number of hidden layers and nodes,..) • Population - Group of all individuals (e.g. alternative strategies, alternative architectures..) • Search Space - All possible solutions to the problem • Chromosome/genotype/genome – “Blueprint” for an individual (abstract representation of a solution, e.g. a graph representing a plan π, a set of weights..) NOTE: in biology these are different things.. • Trait - Possible aspect (features) of an individual (e.g. number of states in a plan, number of hidden node(layers in a neural networks) In Biology • All living organisms consist of cells, and each cell contains the same set of one or more chromosomes—strings of DNA—that serve as a "blueprint" for the organism. • A chromosome can be conceptually divided into genes— each of which encodes a particular protein. Very roughly, one can think of a gene as encoding a trait, such as eye color. • The different possible "settings" for a trait (e.g., eyecolor= blue, brown, hazel) are called alleles. Each gene is located at a particular locus (position) on the chromosome. • The complete collection of genetic material (all chromosomes taken together) is called the organism's genome. • The term genotype refers to the particular set of genes contained in a genome. Chromosome, Genes and Genomes Genotype and Phenotype • Genotype vrs. genome: – Particular set of genes in a genome, e.g. an organism’s full hereditary information (genome refers to the whole genetic make up of an individual or group. Genotype specifies a particular allele make up - what kind of genes the individual has compared to others, e.g. the gene of blue eyes, even though the gene of dark eyes can be in his genome) • Phenotype: – Physical characteristic of the genotype (morphology, development behaviour). Humans can have the same genotype (e.g. twins) but different phenotype Genotype and Phenotype With reference to our previous ML terminology • First, we do no represent objects that we need to classify, but rather solutions to a problem that we want to solve • The representation of a solution is called chromosome, and its elements are genes • Analogy: chromosome feature vector, genes features, traits feature values • But here features (genes) can be related to each other – more often representations are not “flat” vectors of features (as for in supervised classification), rather, they are structured (e.g. a graph or a set of rules) GA Requirements: representation • A typical genetic algorithm requires two things to be defined: – a genetic representation of the solution domain, and – a fitness function to evaluate the solution domain. • A common representation of the solution is as an array of bits (mapping to a structure, e.g. a plan, an architecture..) . Arrays of other types and structures can be used in essentially the same way. • The main property that makes these binary “genetic” representations convenient is that their parts are easily aligned due to their fixed size, that facilitates simple crossover operation (mixing chromosomes). • Variable length representations may also be used, but implementation is more complex in this case. Representation examples Chromosomes could be: – Bit strings (0101 ... 1100) – Real numbers (43.2 -33.1 ... 0.0 89.2) – Lists of rules (R1 R2 R3 ... R22 R23) – A program’s elements (genetic programming) – ... any data structure ... Representation example 1 (chromosomes= rules) • R: If (input1=Low OR High)AND(input2=High) THEN Output=Low • Rules of this type can be encoded in 9 bits where the first 6 represent the antecedent (alternative conditions for Input 1 and 2) and the last 3 bits the value(s) of the consequent • E.g. the first 3 bit represent the conditions for input 1, e.g: 000=no constraints, 001=low, 010=high, 011=low or high… More examples of chromosome representation types • • • • • Chromosomes can represent different architectures of neural networks (e.g. different number of layers, different number of nodes per layers) Chromosomes can represent alternative “body” structures for creatures Chromosomes can be alternative strategies for a prey to escape the predator (or for a predator to capture a prey) In all these cases a chromosome encodes a graph structure A graph structure can be represented with a binary adjacency matrix (columns and rows are nodes, a 1 in cel ij indicated a connection between node i and j) Evolving swimming creatures Chromosome ≠ feature vector • In previous example the “architecture” of creatures is a set of joints (nodes) and elastic edges: elements of a representation are structurally connected! • Note also that a solution is made of two aspects in the swimming creatures examples: the body of the creature (architecture), and the strategy to swim (which also is to be mapped into an abstract “chromosome” representation) • Multiple problems (e.g. best body for best swimming stategy) can be solved jointly or separately GA Requirements :Fitness function • The fitness function is defined over the genetic representation and measures the quality of the represented solution. • The fitness function is always problem dependent. • Usually the fitness function is defined with relation to the objective task: for example, for flying creatures, the time they are able to fly before the fall down • If we evolve a neural network architecture, the fitness will be the classification precision of that particular architecture and the convergence time • If the evolve a driving agent, the fitness can be the number of times it correctly stops when recognizing and obstacle, or the time to reach a target location, etc. A fitness function produces a ranking of individuals Basics of GA: algorithm steps • The most common type of genetic algorithm works like this: – a population Pt is created at time t with a group of individuals created randomly. – The individuals in the population are then evaluated with fitness function f. – The evaluation function gives the individuals a score based on how well they perform at the given task. – A subset of individuals P’t are then selected based on their fitness, the higher the fitness, the higher the chance of being selected (sample probability is fitness-dependent). – These individuals’ chromosomes are paired and then "reproduce" to create one or more “offspring”, after which the offspring are “mutated” randomly. So we get a new population Pt+1 – This continues until a suitable solution has been found or a certain number of generations have passed, depending on the needs of the programmer. Let’s take a problem: motion control • Motion control applications can be found in almost every sector of industry, from factory automation and robotics to high-tech computer hard-disk drives. • They are used to regulate mechanical motions in terms of position, velocity, and acceleration, and/or to coordinate the motions of multiple axes or machine parts. • How to select the most appropriate control law and its parameters? Encoding the parameters (1) • This implies creating a representation for each parameter setting (=a solution) and a stepsize (the increment/decrement of parameter values) 1. Model parameters to a binary string (chromosome): Each value is a setting of a parameter (e.g. velocity, acceleration, position of a mechanical arm, joint rotation..) Encoding parameters (2) • For each parameter, a minimum value, stepsize and encoding length has to be prespecified : Generating a population • Initialization • Initially many individual solutions are randomly generated to form an initial population P. • The population size depends on the nature of the problem, but typically contains several hundreds or thousands of possible solutions. • Traditionally, the population is generated randomly, covering (samples of) the entire range of possible solutions (the search space). • Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to be found (if some initial knowledge is available). Initial population P x1=0.625, x2=2.5 x1=-0.0625, x2=1.3 x1=-0.4375, x2=1.7 . . . x1=0.375, x2=1.6 Rating the individuals of a population P • A fitness function is computed (as we said fitness depends on the application domain) for every individual of P • For example, let’s say that (un)fitness function is: O(x1,x2)=x12+x22 • f(x)=1/O(x) Creating subsequent generations Pt+1 • Selection • During each successive generation, a proportion P’ of the existing population P: p1,p2..pn is selected to breed a new generation. • Individual solutions are selected through a fitness-based process, where “fitter” solutions (as measured by a fitness function) are typically more likely to be selected. Creating successive generations (2) • Certain selection methods rate the fitness of each solution and preferentially select the best solutions. Other methods rate only a random sample of the population, as this process may be very time-consuming. • Most functions are stochastic and designed so that also a small proportion of less fit solutions are selected. This helps keep the diversity of the population large, preventing premature convergence on poor solutions. • Popular and well-studied selection methods include roulette wheel selection and tournament selection. Creating successive generations (3) • In roulette wheel selection, individuals are given a probability of being selected that is directly proportionate to their fitness. • Two individuals are then chosen randomly based on these probabilities and produce offspring. Example (roulette wheel) • Based on the objective function value, a probability is assigned to make it into the next generation P(pj ) = f (1/O(xi )). In our example: NOTE: individuals with a higher fitness are more likely to be selected in the new generation. Also note than an individual can be sampled more than once. Modify selected members of population (4) • Reproduction • The next step is to generate a second generation population of solutions from P’ through genetic operators: crossover (also called recombination), and/or mutation. • For each new solution to be produced, a pair of "parent" solutions is selected for breeding from the pool P’ selected previously. • By producing a "child" solution using the above methods of crossover and mutation, a new solution is created which typically shares many of the characteristics of its "parents". Crossover Here the “random” point is 3, e.g. parents exchange their chromosome after bit 3, generating two new individuals Mutuation Example With probability (1-Pc) no crossover With probability Pc crossover, and random crossover point=4 With probability Pm permutation Iterate (5) • These processes ultimately result in the next generation population of chromosomes that is different from the initial generation. • Generally the average fitness will have increased by this procedure for the population, since only the best organisms from the first generation are selected for breeding, along with a small proportion of less fit solutions, for reasons already mentioned above. Crossover (more on) • The most common type is single point crossover. In single point crossover, you choose a locus at which you swap the remaining alleles from on parent to the other (e.g. you swap eye color values of eye color gene). This is complex and is best understood visually. • The point at which the chromosome is broken depends on the randomly selected crossover point (but should be at the beginning –locus- of a gene, e.g the bit string is made of substrings with a “meaning” –the genes- and we must start at the beginning of a gene ). • This particular method is called single point crossover because only one crossover point exists. • Crossover does not always occur, however. Sometimes, based on a probability (Pc), no crossover occurs and the parents are copied directly to the new population. Single point crossover Mutation (more on) • After selection and crossover, you now have a new population full of individuals. • Some are directly copied, and others are produced by crossover. • In order to ensure that the individuals are not all exactly the same, you allow for a small chance of mutation. • You loop through all the alleles of all the individuals, and if that allele is selected for mutation, you can either change it by a small amount or replace it with a new value. The probability of mutation is usually between 1-2%. • Mutation is fairly simple. You just change the selected alleles based on what you feel is necessary and move on. Mutation is, however, vital to ensuring genetic diversity within the population. • In other terms it avoids that at some point all the members of a generation are too similar to each other (like a “local” maximum) Mutation Termination condition (6) • This generational process is repeated until a termination condition has been reached. • Common terminating conditions are: – – – – A solution is found that satisfies minimum criteria Fixed number of generations reached Allocated budget (computation time/money) reached The highest ranking solution's fitness is reaching or has reached a plateau such that successive iterations no longer produce better results – Manual inspection – Any combinations of the above GA Pseudo-code (summary) Choose initial population Evaluate the fitness of each individual in the population Repeat Select best-ranking individuals to reproduce Breed new generation through crossover and mutation (genetic operations) and give birth to offspring Evaluate the individual fitnesses of the offspring Until <terminating condition> An example of application: Evolving Neural Networks • Evolving the architecture of neural network is slightly more complicated, and there have been several ways of doing it. • For small nets, a simple matrix represents which neuron connects which, and then this matrix is, in turn, converted into the necessary 'genes', and various combinations of these are evolved. Other Areas • Genetic Algorithms can be applied to virtually any problem that has a large search space. • Genetic algorithms used to filter out 'good' and 'bad' riffs for jazz improvisation. • The military uses GAs to evolve equations to differentiate between different radar returns. • Stock companies use GA-powered programs to predict the stock market. Checkboard example – We are given an n by n checkboard in which every field can have a different colour from a set of four colors. – Goal is to achieve a checkboard in a way that there are no neighbours with the same color (not diagonal) 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Checkboard example Cont’d – Chromosomes represent the way the checkboard is colored. – Chromosomes are not represented by bitstrings but by bitmatrices – The bits in the bitmatrix can have one of the four values 0, 1, 2 or 3, depending on the color. – Crossing-over involves matrix manipulation instead of point wise operating. – Crossing-over can be combining the parential matrices in a horizontal, vertical, triangular or square way. – Mutation remains bitwise changing bits in either one of the other numbers. Checkboard example Cont’d • This problem can be seen as a graph with n nodes and (n-1) edges, so the fitness f(x) is defined as: f(x) = 2 · (n-1) ·n Checkboard example Cont’d • Fitnesscurves for different cross-over rules: Fitness Lower-Triangular Crossing Over Square Crossing Over 180 180 170 170 160 160 150 150 140 140 130 0 100 200 300 400 500 130 0 Fitness Horizontal Cutting Crossing Over 180 170 170 160 160 150 150 140 140 0 200 400 Generations 600 400 600 800 Verical Cutting Crossing Over 180 130 200 800 130 0 500 1000 Generations 1500 Evolving body shapes • embedded graphs to represent body shape and controller • different environments and/or fitness functions to evolve various behaviours, e.g. – Swimming – Walking – Jumping – Following We end with something you saw at the beginning..