* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Document
Inbreeding avoidance wikipedia , lookup
Viral phylodynamics wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Behavioural genetics wikipedia , lookup
Designer baby wikipedia , lookup
Public health genomics wikipedia , lookup
Genetic testing wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genetic engineering wikipedia , lookup
Heritability of IQ wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
Group selection wikipedia , lookup
Dual inheritance theory wikipedia , lookup
Human genetic variation wikipedia , lookup
Genetic drift wikipedia , lookup
Gene expression programming wikipedia , lookup
Koinophilia wikipedia , lookup
CpSc 810: Machine Learning Genetic Algorithm Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources. The Copyright belong to the original authors. Thanks! 2 Overview of Genetic Algorithm (GA) GA is a learning method motivated by analogy to biological evolution. GAs search the hypothesis space by generating successor hypotheses which repeatedly mutate and recombine parts of the best currently known hypotheses. In Genetic Programming (GP), entire computer programs are evolved to certain fitness criteria. 3 Biological Evolution Lamarck and others: Species “transmute” over time Darwin and Wallace: Consistent, heritable variation among individuals in population Natural selection of the fittest Mendel and genetics: A mechanism for inheriting traits Genotype->phenotype mapping 4 Genetic Algorithms The algorithm operates by iteratively updating a pool of hypotheses, called the population. On each iteration, all members of the population are evaluated according to the fitness function. A new population is then generated by probabilistically selecting the most fit individuals from current population 5 some of these selected individuals are carried forward into the next generation population intact. Others are used as the basis for creating new offspring individuals by applying genetic operations. GA GA(Fitness, Threshold, p, r, m) Initialize Population: generate p hypotheses at random->P. Evaluate: for each p, compute fitness(h) While Maxh Fitness(h) < Threshold do Select: probabilistically select (1-r)p members of P. Call this new generation Pnew Pr( hi ) Fitness(hi ) |P| Fitness(hi ) j 1 Crossover: probabilistically select (r*p)/2 pairs of hypotheses from P. For each pair, <h1, h2> produce two offspring by applying the crossover operator. Add all offspring to Pnew. Mutate: Choose m% of PNew with uniform probability. For each, invert one randomly selected bit in its representation. Update: P <- Pnew Evaluate: for each p in P, compute fitness(p) 6 Return the hypothesis from P that has the highest fitness. Representing Hypotheses In GAs, hypotheses are often represented by bit strings so that they can be easily manipulated by genetic operators such as mutation and crossover. Examples: Represent 7 (Outlook = Overcast v Rain) ^ (Wind = Strong) By Outlook Wind 011 10 Represent IF Wind = Strong THEN PlayTennis = yes By Outlook Wind PlayTennis 111 10 10 Genetic Operators Crossover Techniques: produce two new offspring from two parent strings by copying selecting bits form each parents. The choice of which parent contributes the bit for position i is determined by an additional string called crossover mask. Single-point Crossover: Mask example: 11111000000 Two-point Crossover. Mask example: 00111110000 Uniform Crossover. Mask example: 10011010011 8 Genetic Operators Mutation Techniques: produces small random changes to the bit string Point Mutation 9 Select Most Fit Hypothesis A simple measure for modeling the probability that a hypothesis will be selected is given by the fitness proportionate selection (or roulette wheel selection): Pr(hi)= Fitness(hi)/j=1p Fitness(hj) This simple measure can lead to crowding Tournament Selection Pick h1, h2 at random with uniform probability With probability p, select the more fit Rank Selection Sort all hypotheses by fitness Probability of selection is proportional to rank 10 In classification tasks, the Fitness function typically has a component that scores the classification accuracy over a set of provided training examples. Other criteria can be added (e.g., complexity or generality of the rule) GABIL (DeJong et al. 1993) Lean disjunctive set of propositional rules, competitive with C4.5 Fitness: Fitness(h)=(correct(h))2 Representation: IF a1=T Λ a2=F THEN c=T; IF a2=T THEN c=F Represented by a1 a2 c a1 a2 c 10 01 1 11 10 0 Genetic operators: Standard mutation operators Extended two point crossover 11 Want variable length rule sets Want only well-formed bitstring hypotheses Crossover with Variable-Length Bit-strings Start with h1 h2 a1 a2 c a 1 a2 c 10 01 1 11 10 0 01 11 0 10 01 0 1. choose crossover points for h1, e.g., after bit 1, 8 2. now restrict points in h2 to those that produce bitstrings with well-defined semantics, e.g., <1,3>, <1,8>, <6, 8> Let d1 and d2 denote the distance from the leftmost and rightmost of two crossover points in h1 to the rule boundary immediately to it left. Then, the crossover points in h2 must have the same d1 and d2 values. If we choose <1,3> result is h3 12 h2 a1 a2 c 11 10 0 a1 a2 c a1 a2 c a1 a2 c 00 01 1 11 11 0 10 01 0 GABIL Extentions Add new genetic operators, also applied probabilistically 1.AddAlternative: generalize constraint on ai by changing a 0 to 1. 2. DropCondition: generalize constraint on by change every 0 to 1. And add new filed to bitstring to determine whether to allow these a1 a2 c a1 a2 c AA DC 01 11 0 10 01 0 1 0 So now the learning strategy also evolves 13 GABIL Results Performance of GABIL comparable to symbolic rule/tree learning methods C4.5, ID5R, AQ14 Average performance on a set of 12 synthetic problems: GABIL with out AA and DC operator: 92.1% GABIL with AA and DC operators: 95.2% Symbolic learning methods ranged from 91.2 to 96.6%. 14 Hypothesis Space Search GA search can move very abruptly (as compared to Backpropagation, for example), replacing a parent hypothesis by an offspring that may be radically different from the parent. The problem of Crowding: when one individual is more fit than others, this individual and closely related ones will take up a large fraction of the population. Solutions: 15 Use tournament or rank selection instead of roulette selection. Fitness sharing: the measured fitness of an individual is reduced by the presence of other similar individuals in the population. restrict ion on the kinds of individuals allowed to recombine to form offspring. The Schema Theorem [Holland, 75] Definition: A schema is any string composed of 0s, 1s and *s where * means ‘don’t care’. Example: schema 0*10 represents strings 0010 and 0110. Characterize population by number of instance representing each possible schema 16 Consider Just Selection f(t)= average fitness of population at time t m(s, t) = number of instances of schema s in population at time t. u^(s, t) = average fitness of instances of s at time t Probability of selecting h in one selection step Pr( h ) f (h ) n f ( hi ) f (h) nf (t ) i 1 Probability of selecting an instance of s in one step f (h) uˆ( s, t ) Pr( h s) m( s, t ) nf (t ) hs pt nf (t ) Expected number of instances of s after n selections 17 uˆ ( s, t ) E[m( s, t _ 1)] m( s, t ) f (t ) Schema Theorem The full schema theorem provides a lower bound on the expected frequency of schema s, as follow: uˆ( s, t ) d ( s) o( s ) E[m( s, t _ 1)] m( s, t )1 pc ( 1 p ) m f (t ) l 1 pc= probability of single point crossover operator pm=probability of mutation operator l = length of single bit strings o(s) = number of defined (non “*”) bits in s d(s) = distance between leftmost, rightmost defined bits in s 18 The Schema Theorem: More fit schemas will tend to grow in influence, especially schemas containing a small number of defined bits (i.e., containing a large number of *s), and especially when these defined bits are near one another within the bit string. Genetic Programming Genetic programming is a form of evolutionary computation in which the individuals in the evolving population are computer programs rather than bit strings Population of programs represented by trees 19 Crossover 20 Models of Evolution and Learning I: Lamarckian Evolution [Late 19th C] Proposition: Experiences of a single organism directly affect the genetic makeup of their offsprings. Assessment: This proposition is wrong: the genetic makeup of an individual is unaffected by the lifetime experience of one’s biological parents. However: Lamarckian processes can sometimes improve the effectiveness of computerized genetic algorithms. 21 Models of Evolution and Learning II: Baldwin Effect Assume Individual learning has no direct influence on individual DNA But ability to learn reduces need to “hard wire” traits in DNA Then Ability of individuals to learn will support more diverse gene pool Because learning allows individuals with various “hard wired” traits to be successful More diverse gene pool will support faster evolution of gene pool 22 Individual learning (indirectly) increases rate of evolution Models of Evolution and Learning II: Baldwin Effect Plausible example: New predator appears in environment Individuals who can learn (to avoid it) will be selected Increase in learning individuals will support more diverse gene pool Resulting in faster evolution Possibly resulting in new non-learned traits such as instinctive fear of predator 23 Computer Experiments on Baldwin Effect Evolve simple neural networks Some network weights fixed during lifetime, other trainable Genetic makeup determines which are fixed and their weight values Results With no individual learning, population failed to improve over time When individual learning allowed Early generations: population contained many individuals with many trainable weights Later generations: higher fitness, while number of trainable weights decreased 24 Summary: Evolutionary programming Conduct randomized, parallel, hill-climbing search through H Approach learning as optimization problem (optimize fitness) Nice feature: evaluation of Fitness can be very indirect Consider learning rule set for multistep decision making No issue of assigning credit/blame to individual steps 25 Parallelizing Genetic Algorithms GAs are naturally suited to parallel implementation. Different approaches were tried: Coarse Grain: subdivides the population into distinct groups of individuals (demes) and conducts a GA search in each deme. Transfer between demes occurs (though infrequently) by a migration process in which individuals from one deme are copied or transferred to other demes Fine Grain: One processor is assigned per individual in the population and recombination takes place among neighboring individuals. 26