* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Parallel Machine Scheduling with Sequence
Hybrid (biology) wikipedia , lookup
Heritability of IQ wikipedia , lookup
Genetic testing wikipedia , lookup
Genetic drift wikipedia , lookup
Biology and sexual orientation wikipedia , lookup
Designer baby wikipedia , lookup
Public health genomics wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Genomic imprinting wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Segmental Duplication on the Human Y Chromosome wikipedia , lookup
Population genetics wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Medical genetics wikipedia , lookup
Microevolution wikipedia , lookup
Gene expression programming wikipedia , lookup
Skewed X-inactivation wikipedia , lookup
Genome (book) wikipedia , lookup
Y chromosome wikipedia , lookup
Utilizing Condor to Support Genetic Algorithm Design Research Mary Beth Kurz, PhD Assistant Professor Department of Industrial Engineering Clemson University Genetic algorithms are metaheuristics for optimization evaluation Feasible solutions Solution space Objective space Infeasible solutions Chromosome 1 decoding Chromosome 2 Chromosome N Chromosome space 2 My research focuses on what chromosomes should look like and asks whether the “solution representation” impacts the quality of a genetic algorithm Let’s make this more concrete: the TSP Feasible Solutions: tours that visit all cities exactly once evaluation Solution space Each feasible solution’s total travel time is that solution’s objective Objective space Infeasible Solutions: anything else Chromosome 1 decoding Chromosome 2 Chromosome N Chromosome space 3 The Traveling Salesman Problem asks how to route the salesman through his cities so he returns home as quickly as possible How do I represent the tours? • Directly – the city list? • Indirectly – the list of roads taken? My Hypothesis: Solution representation affects GA design significantly Feasible solutions Solution space No chromosomes map to these solutions Optimal objective value Objective space Infeasible solutions Chromosome 1 Chromosome 2 This is the optimal solution!!! Fix or forbid these chromosomes? Chromosome space 4 Kurz Genetic algorithms are motivated by an analogy to “real” genetics A chromosome is composed of genes, generally randomly selected initially Chromosome 1 Chromosome 2 Chromosome N Genetic Operators Randomness comes here Population(t) Selection picks Crossover some creates Mutation changes a chromosomes new as chromosomes by small number of genes potential parents taking genes in infrom 2 the entire population crossover parents 5 Chromosome 1 Chromosome 2 Chromosome N Population(t+1) Kurz This research is empirical and requires immense computational time Genetic Algorithms are inherently random Is it possible that some representation consistently finds better 6 solutions for a specific problem? Most GA research currently uses 50 replications on numerous data files 180 problem types, 10 files each, 3 representations = 5400 files Simplest representation – 1800 files would take about 45 hours in my Lab (a few years ago) 50 replications of 5400 files → at least 241 days of running time! This is simply not feasible Kurz Grid computing is saving me 7 Spring 2007 325,000 hrs Summer – Fall 2007 212,000 hrs Spring 2008 124,000 hrs Total: about 660,000 hrs Kurz Since last spring, I’ve had to relearn how to do research! How do I compile all this data? VBA and Excel! 120 100 80 What can I actually analyze? 60 Not pictures like this 40 What statistics do I need to use? 20 0 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 217 Reduce the data to correlations Needed to learn non-parametric statistics Needed to use SPSS for the analysis Used VBA to create the input files Reran to get different output data Summer – Fall 2007 8 212,000 hrs Kurz I don’t know about random numbers I started using rand() in C! I use up to 600,000,000 random numbers in each run I have 270,000 runs (5400 * 50) Trying to use Mersenne Twister Period is 219937 – 1 which is plenty big How do I make sure I have independent sets of random numbers? Use the same initial seed, then “burn” through (n-1) numbers until we get to the nth set Would take over 4000 days to burn through 269,999 sets for the last run Again … not feasible! Tried to initialize using run number 9 Spring 2008 124,000 hrs Kurz I still love Condor But I don’t know about random numbers Thought about saving the random numbers in an input file of 600,000,000 numbers each Stopped generating the first file after it got to 3 GB This would mean 3*270,000 GB of random number files! Looking at dynamic streams from Mersenne Twister Just heard about SPRNG from Todd on Tuesday Gearing up for another set of runs … all I need is this set of runs to get a paper out! 10 Kurz