* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ceng 562 machine learning
Survey
Document related concepts
Transcript
CENG 562 MACHINE LEARNING GENETIC ALGORITHMS Presented by Abdurrahman Çarkacıoğlu OUTLINE • • • • What is GA? Biological Basis Terminology Operations – – – – Fitness Function Reproduction Crossover Mutation • Genetic Programming • Traditional Methods vs. GA. • Applications and Research Topics • Summary & Conclusion WHAT IS GA ?. • Search algorithms based on – mechanics of natural selection – natural genetics • Robust, flexible, efficient in vast complex spaces. • Use the idea of randomness search but not directionless. BIOLOGICAL BASIS OF GA • Chromosomes – Heredity aspects of the individual • Genes – Encodes a specific feature of the individual. – Actual value allele. • Mating – Both parent pass their chromosomes onto their offsprings. • The weaker ones tend to die • Stronger ones survive and produce new offsprings. • Rarely mutation occurs GA’s TERMINOLOGY • String Structures Chromosomes (Hypothesis) • Elements of the String Gene's (Allele) • Hypothesis/String Structures – Composed of features, or detectors. REPRESENTATION of HYPOTHESIS • Bit Strings (most of the time) • Examples: Outlook={Sunny,Overcast,Rain} 001 encodes Outlook=Rain 011 encodes Outlook=Overcast or Rain 111 encodes Outlook=Sunny or Overcast or Rain Rule Representation Example • Rule : Whatever the outlook, if wind=strong then PlayTennis. Given: Wind {Strong,Weak} PlayTennis {Yes,No} Solution: Outlook Wind PlayTennis 111 10 10 BASIC GA OPERATIONS Inner workings of GA is simple. Based on simple string copying substracting concetation nothing more,nothing less. FITNESS FUNCTION Returns a single numerical fitness value. Problem dependent For a function optimization search, it is simply the value of the function. • For learning classification rules, it is classification accuracy over a set of training examples REPRODUCTION Allows individual strings to be copied for possible inclusion in the next generation. Copying chance is based on the fitness value. String Ftn 01001 5 10000 12 01110 9 Perc. 19% 46% 35% 46% of the time 10000 is selected. CROSSOVER • Produce 2 new offspring from 2 parent string. Initial Strings Offsprings MUTATION • Applied after crossover. • Randomly alters each gene with a small probability, typically 0,001. Point Mutation Example Initial String After Mutation 11101101 11111101 Pseudo Code for GA • choose initial population at random • evaluate each individual's fitness • repeat select individuals to reproduce mate pairs at random apply crossover operator apply mutation operator evaluate each individual's fitness • until terminating condition RECENT WORKS Using real numbers instead of bit strings Use of knowledge-augmented operators for GAs. Strings may be variable lengths. genetic GENETIC PROGRAMMING • Branch of genetic algorithms • Difference is the representation of the solution • Creates computer programs in the Lisp or Scheme computer languages as the solution. Pseudo Code for GP 1)Create an initial population of random composition of the functions and terminals of the problem. 2) Execute each program, and assign it a fitness value according to the how well it solves a problem. 3)Create new population of computer programs. • )Copy the best existing programs. • )Create new programs by mutation • )Create new computer programs by crossover 4) Announce the best computer program when predefined conditions are satisfied. GA vs. GP GP is much more powerfull than GA. The output of the GA is a quantity, while output of GP is a another computer program. Computer programs that program themselves. Where there is no ideal solution, GP works best (ex. A program that drives a car). Traditional Methods vs. GAs – .GA a coded form of the function values, rather than with the actual values themselves. – .GA use a set, or population not just a single point on the problem space. – .GA use only payoff information to guide themselves. – .Probabilistic, not deterministic. – .Inherently parallel. A Brief Survey of GA Technology • • • • • • • • • .Criminal suspect recognition .Music Composition .Constructing and Training Neural Networks .Scheduling Problems .Games Playing .Structural Optimization .Function Optimization .Database Query Optimization Aircraft Design GABIL by DeJong (1993) Uses GA to learn boolean concepts, represented by disjunctive set of propositional rules. Roughly comparable performance among C4.5, ID5R and AQ14 systems. Over 12 synthetic problems Average generalization accuracy for GABIL Others 92.1% 91.6% to 96.6% Summary & Conclusions conduct a randomized, parallel, hill-climbing search for hypothesis that optimize a predefined fitness function. operate on populations of strings/hypothesis/chromosomes. string represents some underlying parameter set. reproduction, crossover and mutation are used to create new populations.