* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download s - Universidad Politécnica de Madrid
Survey
Document related concepts
Dual inheritance theory wikipedia , lookup
Behavioural genetics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genetic engineering wikipedia , lookup
Heritability of IQ wikipedia , lookup
Genetic testing wikipedia , lookup
Public health genomics wikipedia , lookup
Group selection wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Human genetic variation wikipedia , lookup
Genome (book) wikipedia , lookup
Koinophilia wikipedia , lookup
Gene expression programming wikipedia , lookup
Genetic drift wikipedia , lookup
Transcript
Genetic Algorithms Data Mining and Distributed Computing Group Facultad de Informática 1 DATSI, Universidad Politécnica de Madrid Overview 2 A class of probabilistic optimization algorithms Inspired by the biological evolution process Uses concepts of “Natural Selection” and “Genetic Inheritance” (Darwin 1859) Originally developed by John Holland (1975) Particularly well suited for hard problems where little is known about the underlying search space Widely-used in business, science and engineering DATSI, Universidad Politécnica de Madrid Natural Evolution Darwin: The Origin of Species – – Search of optimal forms (environment) Based on: 3 assortment : recombination of genetic material randomness selection : survival of the fittest DATSI, Universidad Politécnica de Madrid Biological Concepts (Cell) • A set of many small “factories” working together • Center: cell nucleus • The nucleus contains the genetic information 4 DATSI, Universidad Politécnica de Madrid Biological Concepts (Chromosome) • Chromosomes store genetic information • Each chromosome is build of DNA • Chromosomes in humans form pairs (23 pairs) • The chromosome is divided in parts: genes • Genes code for properties • Possible values of the genes: allele • Position of the gene in the chromosome: locus 5 DATSI, Universidad Politécnica de Madrid Biological Concepts (Genetics) • The entire combination of genes: genotype • A genotype is expressed as a phenotype • Alleles can be either dominant or recessive • Dominant alleles will always express from the genotype to the phenotype • Recessive alleles can survive in the population for many generations, without being expressed 6 DATSI, Universidad Politécnica de Madrid Biological Concepts (Reproduction) 7 Mitosis: copying the same genetic information to new offspring: there is no exchange of information Normal way of growing of multicell structures, like organs. DATSI, Universidad Politécnica de Madrid Biological Concepts (Reproduction) • Meiosis: the basis of sexual reproduction • After meiotic division 2 gametes appear in the process • In reproduction two gametes conjugate to a zygote wich will become the new individual • Genetic information is shared between the parents in order to create new offspring 8 DATSI, Universidad Politécnica de Madrid Biological Concepts (Reproduction) • During reproduction “errors” occur • Due to these “errors” genetic variation exists • Most important “errors” are: • Recombination (cross-over) • Mutation 9 DATSI, Universidad Politécnica de Madrid Biological Concepts (Natural Selection) • The origin of species: “Preservation of favourable variations and rejection of unfavourable variations.” • Survival of the fittest • Mathematical expresses as fitness: success in life 10 DATSI, Universidad Politécnica de Madrid Genetic Algorithms (GAs) John Holland (~1973) – – GAs use the principle of natural selection to solve complicated optimization problems. An alternative of brute-force search on a complex solution space. 11 Instead of complete exploration Estocastic-driven search DATSI, Universidad Politécnica de Madrid Genetic Algorithms (GAs) Search techniques Calculus-based techniques Direct methods Finonacci Guided random search techniques Indirect methods Newton Evolutionary algorithms Evolutionary strategies 12 Dynamic programming Genetic algorithms Parallel Centralized Simulated annealing Enumerative techniques Distributed Sequential Steady-state Generational © W. Williams, ‘99 DATSI, Universidad Politécnica de Madrid Genetic Algorithms: Definitions Concepts: – – – – Individual: A possible solution of the problem. Gene: A solution component (a.k.a. an attribute) [eye color] Allele: A gene value [eye color=green] Population: group of potential solutions + 10011101011011 -19 4 108 46 0.433 -33.345 0.0013 13 A x B 3 DATSI, Universidad Politécnica de Madrid Genetic Algorithms: Definitions Concepts: – – Genotype: Genetic codification (seq. of genes) Phenotype: Actual individual (with its capabilities) For GAs, phenotype represents the fitness value of the solution: Asumption: there exists a relationship between genetic information and individual fitness. 14 DATSI, Universidad Politécnica de Madrid Fitness Landscapes Fitness: Function that evaluates individual capabilities. Graphical representation N+1 dimensions fitness 15 © L.J. Pit, ‘96 DATSI, Universidad Politécnica de Madrid Natural Evolution and Operators How new individuals are created? – Crossover (recombination) – mixing existing genetic material, assortment Inheritage of usefull characteristics Mutation: rare (very rare in biological perspective) Randomness They simulate the evolution process 16 DATSI, Universidad Politécnica de Madrid Natural Evolution and Operators Crossover #9 00010100010001 10011101011011 00010100011011 Variants: 17 2, 3, n-points crossover Uniform crossover Other... DATSI, Universidad Politécnica de Madrid Natural Evolution and Operators Mutation #5 00010100011011 00011100011011 Variants: 18 Mutation-by-swap Biased mutation Cataclysmic mutation DATSI, Universidad Politécnica de Madrid Natural Evolution and Operators Selection: 1. 2. Evaluation of individuals (fitness) Parent selection (for each crossover operation). Selective pressure: relationship between capacities (fitness) of the individual and its possibilities to generate new individuals (participate in the reproduction). 19 DATSI, Universidad Politécnica de Madrid Natural Evolution and Operators Selection (alternatives): – – – Tournament selection Rank-based: Continuous function Roulette wheel Variants: – 20 Previous population selection (worst individual elimination). DATSI, Universidad Politécnica de Madrid Canonical Genetic Algorithm P0 Initial population end condition? Fitness-function selection P’i Selected population crossover/mutation Pi+1 21 Next population Includes the chances of participation in reproduction DATSI, Universidad Politécnica de Madrid GA Variants (Replacement) Elitism: – – The best(s) individuals in Pi stay in population Pi. Provides monotonic fitness increment Steady-state: – – 22 Only one individual is created per generation. New one replaces worst individual. DATSI, Universidad Politécnica de Madrid Replacement Overview Selection Crossover/mutation Pi Current population Size=N Offi Offspring population P’i Next population Canonical 23 Steady-sate Elitism Offspring size N 1 >, = ó < N Current Next 0 N-1 << N Replacement All Worst Worst DATSI, Universidad Politécnica de Madrid The Metaphor Genetic Algorithm 24 Nature Optimization problem Environment Feasible solutions Individuals living in that environment Solutions quality (fitness function) Individual’s degree of adaptation to its surrounding environment A set of feasible solutions A population of organisms (species) Stochastic operators Selection, recombination and mutation in nature’s evolutionary process Iteratively applying a set of stochastic operators on a set of feasible solutions Evolution of populations to suit their environment DATSI, Universidad Politécnica de Madrid Aspects to consider: Population Size What’s the optimal size of the population? – – 25 Too small: premature convergence (weak exploration) Too large: waste of resources DATSI, Universidad Politécnica de Madrid Aspects to consider: Problem representation A key aspect: chromosomes representation – – Initially, a binary string Nowadays, other possibilities: – 26 Bit strings Real numbers Permutations of element Lists of rules Program elements ... any data structure ... (0101 ... 1100) (43.2 -33.1 ... 0.0 89.2) (E11 E3 E7 ... E1 E15) (R1 R2 R3 ... R22 R23) (genetic programming) Great influence in the problem resolution DATSI, Universidad Politécnica de Madrid GA elements to define 27 Chromosome representation Creation of initial population Fitness function Genetic operators Parameters (population size, probabilities of the genetic operators, etc.) Typical configuration for small problems: Population size: 50 – 100 Children per generation: = population size Crossovers: > 85% Mutations: < 5% Generations: 20 – 20,000 DATSI, Universidad Politécnica de Madrid Example (The MAXONE problem) • Suppose we want to maximize the number of ones in a string of m binary digits Is it a trivial problem? • It may seem so because we know the answer in advance 28 • However, we can think of it as maximizing the number of correct answers, each encoded by 1, to m yes/no difficult questions` DATSI, Universidad Politécnica de Madrid MAXONE problem 29 An individual is encoded (naturally) as a string of m binary digits The fitness f of a candidate solution to the MAXONE problem is the number of ones in its genetic code We start with a population of n random strings. Suppose that m = 10 and n = 6 DATSI, Universidad Politécnica de Madrid MAXONE problem (initialization) • We toss a fair coin 60 times and get the following initial population: 30 s1 = 1111010101 f (s1) = 7 s2 = 0111000101 f (s2) = 5 s3 = 1110110101 f (s3) = 7 s4 = 0100010011 f (s4) = 4 s5 = 1110111101 f (s5) = 8 s6 = 0100110000 f (s6) = 3 DATSI, Universidad Politécnica de Madrid MAXONE problem (selection) • Next we apply fitness proportionate selection with the roulette wheel method: Individual i will have a f (i ) f (i) • We repeat the probability to be chosen extraction as many times as the number of Area is 1 2 Proportional individuals we need to n to fitness value have the same parent 3 population size (6 in our case) 4 i 31 DATSI, Universidad Politécnica de Madrid MAXONE problem (selection) • Suppose that, after performing selection, we get the following population: 32 s1` = 1111010101 (s1) s2` = 1110110101 (s3) s3` = 1110111101 (s5) s4` = 0111000101 (s2) s5` = 0100010011 (s4) s6` = 1110111101 (s5) DATSI, Universidad Politécnica de Madrid MAXONE problem (crossover) • Next we mate strings for crossover. For each couple we decide according to crossover probability (for instance 0.6) whether to actually perform crossover or not 33 • Suppose that we decide to actually perform crossover only for couples (s1`, s2`) and (s5`, s6`). For each couple, we randomly extract a crossover point, for instance 2 for the first and 5 for the second DATSI, Universidad Politécnica de Madrid MAXONE problem (crossover) Before crossover: s1` = 1111010101 s2` = 1110110101 s5` = 0100010011 s6` = 1110111101 After crossover: s1`` = 1110110101 s2`` = 1111010101 34 s5`` = 0100011101 s6`` = 1110110011 DATSI, Universidad Politécnica de Madrid MAXONE problem (mutation) • The final step is to apply random mutation: for each bit that we are to copy to the new population we allow a small probability of error (for instance 0.1) • Before applying mutation: 35 s1`` = 1110110101 s4`` = 0111000101 s2`` = 1111010101 s5`` = 0100011101 s3`` = 1110111101 s6`` = 1110110011 DATSI, Universidad Politécnica de Madrid MAXONE problem (mutation) After applying mutation: 36 s1``` = 1110100101 f (s1``` ) = 6 s2``` = 1111110100 f (s2``` ) = 7 s3``` = 1110101111 f (s3``` ) = 8 s4``` = 0111000101 f (s4``` ) = 5 s5``` = 0100011101 f (s5``` ) = 5 s6``` = 1110110001 f (s6``` ) = 6 DATSI, Universidad Politécnica de Madrid MAXONE problem • In one generation, the total population fitness changed from 34 to 37, thus improved by ~9%. • At this point, we go through the same process all over again, until a stopping criterion is met 37 DATSI, Universidad Politécnica de Madrid Local Optimizations (Lamark) Fitness calculation performs an improved solution in the neibourghood: – – Lamarkian Evolution: the inheritance of acquired characteristics. Methods: 38 Hill-climbing Simulated annealing DATSI, Universidad Politécnica de Madrid Local Optimizations (Baldwin) Other alternatives: – – Evolve fitness function But keep genetic representation As a result: – 39 New fitness landscape © L.J. Pit, ‘96 DATSI, Universidad Politécnica de Madrid GAs: Why do they work? • Schema Theorem • Building Blocks Hypothesis 40 DATSI, Universidad Politécnica de Madrid Schema Notation • {0,1,#} is the symbol alphabet, where # is a special wild card symbol • A schema is a template consisting of a string composed of these three symbols • Example: the schema [1#00#] matches the strings: [10000], [10001], [11000] and [11001] 10 0 10000 10001 11000 11001 41 DATSI, Universidad Politécnica de Madrid Schema 42 “A schema is a similarity template describing a subset of strings with similarities at certain positions” John Holland Crossover causes low order schema, to get increased representation in the next generation. Mutation causes long schema to be destroyed! Thus short schema have a better chance of surviving to the next generation DATSI, Universidad Politécnica de Madrid Schema Theorem and Building Block 43 GAs explore the search space by short, loworder schemata which, subsequently, are used for information exchange during crossover Objective: Building blocks (set of alleles with representative participation on fitness/solution quality). DATSI, Universidad Politécnica de Madrid Building Blocks Hypothesis 44 “A genetic algorithm seeks near-optimal performance through the juxtaposition of short, low-order, high-performance schemata, called the building blocks” Some building blocks (short, low-order schemata) can mislead GA and cause its convergence to suboptimal points DATSI, Universidad Politécnica de Madrid Differences betwwen GA and Conventional Search Algorithms 45 GA works on a coding of the parameters set, not the parameter themselves GA searches from a population of points, not a single point GA uses only a payoff function, and no domain knowledge GA uses probabilistic transition rules, not deterministic ones GA can provide a number of potential solutions to a given problem. The final choice is left to the user. DATSI, Universidad Politécnica de Madrid Conclusions “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions you might not otherwise find in a lifetime.” Salvatore Mangano Computer Design, May 1995 46