Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Genetic Algorithms A class of evolutionary algorithms Efficiently solves optimization tasks Potential Applications in many fields Challenges Large execution time International Institute of Information Technology, Hyderabad, India User Specifies … Create Initial Population A representation for chromosome Select Parents Crossover Operator No GA Parameters Create New Population Mutation Operator Termination Criteria A method for fitness evaluation Evaluate Fitness Terminate? International Institute of Information Technology, Hyderabad, India Yes Exit High degree of parallelism Fitness evaluation Crossover Mutation Most obvious : chromosome level parallelism Same Operations on each chromosome Use a thread per chromosome International Institute of Information Technology, Hyderabad, India Thread-per-chromosome model Good enough for small to moderate sized multi-core Doesn’t map well to a massively multithreaded GPUs Solution : identify and exploit gene-level parallelism International Institute of Information Technology, Hyderabad, India International Institute of Information Technology, Hyderabad, India Population Matrix in Memory Thread Blocks in a grid A column of threads read a chromosome gene-by-gene and cooperate to perform operations Results in coalesced read and faster processing International Institute of Information Technology, Hyderabad, India On CPU Parse GA Parameters Construct Initial Population On GPU Generate Random Numbers Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Selection Kernel Crossover Kernel Mutation Kernel International Institute of Information Technology, Hyderabad, India On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Population Scores Selection Kernel Crossover Kernel Mutation Kernel International Institute of Information Technology, Hyderabad, India Partially parallel method Fully parallel method Partially-parallel Method User Specifies a serial code fragment for fitness evaluation. Threads are arranged in a 1D grid. Each thread executes user’s code on one chromosome. Providing chromosome parallelism. Benefit : Abstraction level CUDA familiar user can effectively use 2D thread layout Use gene level Parallelism for fitness evaluation Benefit : Efficiency International Institute of Information Technology, Hyderabad, India Task : Given weights , costs & knapsack capacity Aim : maximize the cost. Fully Parallel Method Representation 1D binary string 0/1: Absence/Presence of an item, W and C are total weight and Cost of given representation Use Best Solution : One with max C given W < Wmax a group of threads to compute total cost and weight in logarithmic time International Institute of Information Technology, Hyderabad, India On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Scores Statistics Selection Kernel Crossover Kernel Mutation Kernel International Institute of Information Technology, Hyderabad, India Selection and Termination most often use Population Statistics We use standard parallel reduce algorithm to calculate Max, Min, Average Scores We use highly optimized public library CUDPP To sort and rank chromosomes International Institute of Information Technology, Hyderabad, India On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Statistics Parents Selection Kernel Crossover Kernel Mutation Kernel International Institute of Information Technology, Hyderabad, India Selection Kernel Uses N/2 threads Each thread selects two parents for producing offspring Uniform Selection : Roulette Wheel Selection: Selects parents in a uniform random manner Fitness based approach, more the fitness, better the chance of selection International Institute of Information Technology, Hyderabad, India Roulette Wheel Image Courtesy : xyz Sort fitness scores Compute a roulette wheel array by doing a prefix-sum scan of scores and normalizing it. Generate a random number in 0-1. Perform binary search in roulette wheel array for the nearest smaller number to the randomly selected number. Return the index of the result in array International Institute of Information Technology, Hyderabad, India On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Old Population New Population Selection Kernel Crossover Kernel Mutation Kernel International Institute of Information Technology, Hyderabad, India 08 12 05 15 Parent2 04 13 07 19 14 Crossover 03 02 02 04 01 Thread idy 1 2 Thread idy 3 4 08 13 02 Population Thread idy 5 6 12 07 02 Thread idx 1-L 02 Thread idx 1-L Parent1 Thread idx 1-L Thread idx 1-L GPU Global Memory International Institute of Information Technology, Hyderabad, India Thread idy 7 8 05 19 02 On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics New Population New Population Selection Kernel Crossover Kernel Mutation Kernel International Institute of Information Technology, Hyderabad, India Thread Id y Thread Id x X X X X X X X X X X X X X X X X X X X X x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Population x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Flip Mutator Each thread handles one gene and mutates it with probability of mutation Thread 1,4 Coin State Gene X Flip Coin Coin State Gene International Institute of Information Technology, Hyderabad, India T Thread Id y Thread Id x X F X F X F X F X F X F X F X F X F X F X F X F X F X F X F X F X F X F X F X F xF xF xF xF xF xF xF xT xF xF xF xF xF xF xF xF xF xF xF xF xT xF xF xF xF xF xF xF xF xF xF xF xF xT xF xF xF xF xF xF xF xF xF xF xF xF xF xF xF xF xF xT xF xF xF xF xF xF xF xF Population xF xF xF xF xF xF xF xF xF xF xF xF xF xF xF xF xF xF xF xF xT xF xF xF xF xF xF xF xF xF xF xF xF xT xF xT xF xF xF xF Flip Mutator Each thread handles one gene and mutates it with probability of mutation Thread 1,4 Coin State Gene X Flip Coin Coin State Gene International Institute of Information Technology, Hyderabad, India T On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Selection Kernel Random No.s Crossover Kernel Mutation Kernel International Institute of Information Technology, Hyderabad, India Extensive use of random numbers No primitive for on the fly single random number generation Solution: We use CUDPP routine to generate a large pool of random numbers on GPU (faster) Generate a pool of random numbers and copy it on GPU If better quality random numbers are needed, this can be replaced by a CPU based routine International Institute of Information Technology, Hyderabad, India Test Device : Test Problem : Test Parameters: A quarter of Nvidia Tesla S1030 GPU Solve a 0/1 knapsack problem Representation : A 1D Binary String Crossover : One-point crossover Mutation : Flip Mutation Selection : Uniform and Roulette Wheel International Institute of Information Technology, Hyderabad, India Ave. Run-time for 100 iterations (Uniform Selection) Ave. Run-time for 100 iterations (Roulette Wheel Selection) Growth in run-time for increase in NxL N: Population Size , L: Chromosome Length International Institute of Information Technology, Hyderabad, India Our approach is modeled after GAlib and maintains structures for GA, Genome and Statistics It is built with enough abstraction from user program so that user does not need to know CUDA architecture or programming. This can be extended to build a GPU-Accelerated GA library International Institute of Information Technology, Hyderabad, India International Institute of Information Technology, Hyderabad, India