Download PPT - web.iiit.ac.in

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript

Genetic Algorithms
 A class of evolutionary algorithms
 Efficiently solves optimization tasks
 Potential Applications in many fields

Challenges
 Large execution time
International Institute of Information Technology, Hyderabad, India
User Specifies …
Create Initial Population
A representation for chromosome
Select Parents
Crossover Operator
No
GA Parameters
Create New Population
Mutation Operator
Termination Criteria
A method for fitness evaluation
Evaluate Fitness
Terminate?
International Institute of Information Technology, Hyderabad, India
Yes
Exit

High degree of parallelism
 Fitness evaluation
 Crossover
 Mutation

Most obvious :
 chromosome level parallelism
 Same Operations on each chromosome
 Use a thread per chromosome
International Institute of Information Technology, Hyderabad, India

Thread-per-chromosome model
 Good enough for small to moderate sized multi-core
 Doesn’t map well to a massively multithreaded GPUs

Solution :
 identify and exploit gene-level parallelism
International Institute of Information Technology, Hyderabad, India
International Institute of Information Technology, Hyderabad, India
Population Matrix in Memory


Thread Blocks in a grid
A column of threads read a chromosome gene-by-gene and
cooperate to perform operations
Results in coalesced read and faster processing
International Institute of Information Technology, Hyderabad, India
On CPU
Parse GA Parameters
Construct Initial Population
On GPU
Generate Random Numbers
Evaluation Kernel
Statistics Update Kernel
GPU Global Memory
Random Numbers
Old Population
New Population
Fitness Scores
Statistics
Selection Kernel
Crossover Kernel
Mutation Kernel
International Institute of Information Technology, Hyderabad, India
On CPU
On GPU
Parse GA Parameters
Generate Random Numbers
Construct Initial Population
Evaluation Kernel
Statistics Update Kernel
GPU Global Memory
Random Numbers
Old Population
New Population
Fitness Scores
Statistics
Population
Scores
Selection Kernel
Crossover Kernel
Mutation Kernel
International Institute of Information Technology, Hyderabad, India
Partially parallel method
Fully parallel method

Partially-parallel Method

User Specifies a serial code
fragment for fitness evaluation.

Threads are arranged in a 1D
grid.

Each thread executes user’s code
on one chromosome.


Providing chromosome
parallelism.
Benefit : Abstraction


level

CUDA familiar user can effectively
use 2D thread layout
Use gene level Parallelism for fitness
evaluation
Benefit : Efficiency
International Institute of Information Technology, Hyderabad, India

Task :

 Given
weights , costs &
knapsack capacity
 Aim : maximize the cost.

Fully Parallel Method
Representation



1D binary string
0/1: Absence/Presence of an item,
W and C are total weight and Cost of
given representation
 Use

Best Solution : One with max C given W
< Wmax
a group of threads to
compute total cost and weight in
logarithmic time
International Institute of Information Technology, Hyderabad, India
On CPU
On GPU
Parse GA Parameters
Generate Random Numbers
Construct Initial Population
Evaluation Kernel
Statistics Update Kernel
GPU Global Memory
Random Numbers
Old Population
New Population
Fitness Scores
Statistics
Scores
Statistics
Selection Kernel
Crossover Kernel
Mutation Kernel
International Institute of Information Technology, Hyderabad, India


Selection and Termination most often use
Population Statistics
We use standard parallel reduce algorithm to
calculate
 Max, Min, Average Scores

We use highly optimized public library
CUDPP
 To sort and rank chromosomes
International Institute of Information Technology, Hyderabad, India
On CPU
On GPU
Parse GA Parameters
Generate Random Numbers
Construct Initial Population
Evaluation Kernel
Statistics Update Kernel
GPU Global Memory
Random Numbers
Old Population
New Population
Fitness Scores
Statistics
Statistics
Parents
Selection Kernel
Crossover Kernel
Mutation Kernel
International Institute of Information Technology, Hyderabad, India

Selection Kernel
 Uses N/2 threads
 Each thread selects two parents for producing
offspring

Uniform Selection :

Roulette Wheel Selection:
 Selects parents in a uniform random manner
 Fitness based approach, more the fitness, better the
chance of selection
International Institute of Information Technology, Hyderabad, India

Roulette Wheel





Image Courtesy : xyz
Sort fitness scores
Compute a roulette wheel array by
doing a prefix-sum scan of scores
and normalizing it.
Generate a random number in 0-1.
Perform binary search in roulette
wheel array for the nearest smaller
number to the randomly selected
number.
Return the index of the result in
array
International Institute of Information Technology, Hyderabad, India
On CPU
On GPU
Parse GA Parameters
Generate Random Numbers
Construct Initial Population
Evaluation Kernel
Statistics Update Kernel
GPU Global Memory
Random Numbers
Old Population
New Population
Fitness Scores
Statistics
Old Population
New Population
Selection Kernel
Crossover Kernel
Mutation Kernel
International Institute of Information Technology, Hyderabad, India
08
12
05
15
Parent2
04
13
07
19
14
Crossover
03
02
02
04
01
Thread idy
1
2
Thread idy
3
4
08 13 02
Population
Thread idy
5
6
12 07 02
Thread idx 1-L
02
Thread idx 1-L
Parent1
Thread idx 1-L
Thread idx 1-L
GPU Global Memory
International Institute of Information Technology, Hyderabad, India
Thread idy
7
8
05 19 02
On CPU
On GPU
Parse GA Parameters
Generate Random Numbers
Construct Initial Population
Evaluation Kernel
Statistics Update Kernel
GPU Global Memory
Random Numbers
Old Population
New Population
Fitness Scores
Statistics
New Population
New Population
Selection Kernel
Crossover Kernel
Mutation Kernel
International Institute of Information Technology, Hyderabad, India
Thread Id y
Thread Id x
X X X X X X X X X X X X X X X X X X X X
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Population
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Flip Mutator

Each thread handles one
gene and mutates it with
probability of mutation
Thread 1,4
Coin State
Gene
X
Flip Coin
Coin State
Gene
International Institute of Information Technology, Hyderabad, India
T
Thread Id y
Thread Id x
X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F X
F
xF
xF
xF
xF
xF
xF
xF
xT
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xT
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xT
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xT
xF
xF
xF
xF
xF
xF
xF
xF
Population
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xT
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xF
xT
xF
xT
xF
xF
xF
xF
Flip Mutator

Each thread handles one
gene and mutates it with
probability of mutation
Thread 1,4
Coin State
Gene
X
Flip Coin
Coin State
Gene
International Institute of Information Technology, Hyderabad, India
T
On CPU
On GPU
Parse GA Parameters
Generate Random Numbers
Construct Initial Population
Evaluation Kernel
Statistics Update Kernel
GPU Global Memory
Random Numbers
Old Population
New Population
Fitness Scores
Statistics
Selection Kernel
Random No.s
Crossover Kernel
Mutation Kernel
International Institute of Information Technology, Hyderabad, India

Extensive use of random numbers

No primitive for on the fly single random number
generation

Solution:

We use CUDPP routine to generate a large pool of
random numbers on GPU (faster)
 Generate a pool of random numbers and copy it on GPU
 If better quality random numbers are needed, this can be
replaced by a CPU based routine
International Institute of Information Technology, Hyderabad, India

Test Device :

Test Problem :

Test Parameters:
 A quarter of Nvidia Tesla S1030 GPU
 Solve a 0/1 knapsack problem




Representation : A 1D Binary String
Crossover : One-point crossover
Mutation : Flip Mutation
Selection : Uniform and Roulette Wheel
International Institute of Information Technology, Hyderabad, India
Ave. Run-time for 100 iterations
(Uniform Selection)
Ave. Run-time for 100 iterations
(Roulette Wheel Selection)
Growth in run-time for increase in NxL
N: Population Size , L: Chromosome Length
International Institute of Information Technology, Hyderabad, India



Our approach is modeled after GAlib and maintains
structures for GA, Genome and Statistics
It is built with enough abstraction from user
program so that user does not need to know CUDA
architecture or programming.
This can be extended to build a GPU-Accelerated
GA library
International Institute of Information Technology, Hyderabad, India
International Institute of Information Technology, Hyderabad, India
Related documents