Download Genetic Relatedness Based Selection

1 Genetic Algorithms Jacob Noyes [email protected] Department of Computer Science University of Wisconsin – Platteville 4/19/2013 Abstract Genetic algorithms are specialized search heuristics which use the fundamental principles of evolution through natural selection to find the best solution to a problem. The process populates a group of random potential solutions to a given problem, selects a specific subset of optimized solutions from the initial population, combines the traits of the selected solutions, injects mutations, and ends with a new generation of potential solutions. This process is then repeated until a certain threshold is met. This paper goes over the basic processes which make up a genetic algorithm, their many variations, how to use them, and how they relate to principles found in the natural world. Introduction Genetic algorithms are a robust, wide-ranging, and simple-to-understand answer to large, complex search spaces. Many search algorithms may match or beat genetic algorithms in efficiency when it is plausible to inspect every possible outcome, but genetic algorithms take over when the possible combinations of parameters exceed the testing capacity. Genetic algorithms can then be used to find the best fit for the situation using a system designed to mimic the process of evolution through natural selection. 2 Genetic algorithms are designed from the ground up with evolution in mind. The process is broken down into four stages: initialization, selection, crossover, and mutation. Initialization begins the process by generating a population of random parent guesses. Selection determines how fit each parent guess is in relation to the search criteria. Crossover takes parent guesses based on how fit they are and creates offspring guesses which share traits from each parent guess. Mutation injects random differences into a small percentage of the offspring to avoid homogeneity. Initialization only happens for the first generation. Selection, crossover, and mutation then repeat until either a certain number of generations have occurred or a specified fitness level has been obtained. Example 1: Y optimization y = -x^2 + 255x; where x is in the range 0 ≤ x ≤ 255. An easily understood example of a problem that can be answered through genetic algorithms is determining what the highest integer value of y is in example 1. This problem can easily be solved through the use of calculus, plotting the line, or even simple guess-and-check, but it will help illustrate how genetic algorithms work. A unique way to encode the problem must be created before this process can begin [2]. Encoding A changeable representation of a guess's traits is decided during encoding. This representation, called its “string”, can be displayed as a series of bits. A problem with a single parameter may use its binary representation as its string. Example 1 will use the binary representation of x as its string. The range of this problem is 0 to 255 so each guess will have an 8-bit string between 00000000 and 11111111. Example 2: Multiple parameter problem y = 2x2 + z2 + w; where x, z, and w are in the range 0 ≤ x/z/w ≤ 7. Problems with multiple parameters may use a concatenation of all binary representations of traits as their strings. Example 2 contains three 3-bit traits. The three traits can be concatenated to create one 9bit string. This creates a single, manageable entity that can be used in the crossover stage [2]. 3 Initialization Now that an encoding procedure has been decided on, the initialization process can take place. Initialization is the process of generating an initial population of parents that will seed the second generation. It is performed only at the beginning of the algorithm and not repeated. A good initial selection of “genes”, or their binary representation, is needed to inject variety at the start of the process; therefore, strings are generated at random. Table 1: Example 1 data Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 x Binary x Fitness Gene pool Locus Offspring 188 10111100 12596 01111101 3 01100110 48 00110000 9936 75 01001011 13500 104 01101000 15704 249 11111001 1494 10 00001010 2450 134 10000110 16214 125 01111101 16250 10011101 10000110 1 11101000 00000110 01101000 1 01001011 01101000 01001011 6 01001001 01111111 Continuing example 1, we will use a population of eight random integers in the range of 0 ≤ x ≤ 255. A random number generator was used to create the initial numbers, shown in Column 1. Column 2 shows the binary representation of each number. The binary representation is what will be used in the process of the genetic algorithm [2]. Selection The next step in the process is selection. Selection deals with the assigning of fitness scores to each 4 potential solution. The level of fitness is determined by how well each solution fulfills the intended outcome of the search. There has to be a benchmark to determine the strength of the solutions. The level of fitness decides which solutions will go on to have offspring and which ones will “die off”. This determines which genes get passed on to new generations. Example 1 will use each parent's y-value as its fitness score. So to figure out how fit an individual is we must only insert the x-values from Column 1 into the equation in Example 1 and solve for each yvalue. The fitness factor has been calculated using this method and is represented in Column 3. As can be seen, there is a disparity between some of the values. Higher fitness factors will be picked more often when reproducing thus passing on the stronger solutions [2]. There are many different ways to pick which solutions will procreate, and they all involve the fitness factor. These methods can be classified into the following categories: proportionate selection, ranking selection, tournament selection, gender-specific selection, genetic relatedness based selection, and elitism. Each method has its advantages and disadvantages. Proportionate Selection Methods Proportionate selection utilizes each individual's fitness in relation to the fitness of other individuals to determine which individuals will be selected for reproduction. Within proportionate selection there are the following methods: roulette wheel selection, deterministic sampling, stochastic remainder sampling with replacement, stochastic remainder sampling without replacement, and stochastic universal selection. Roulette wheel selection pits each individual's fitness against the cumulative fitness of the whole population. First the total fitness factor of the population, Pf, is found by adding up every fitness factor for the given generation. Then a probability of selection, psel, is calculated for each individual by dividing its fitness level by the total fitness factor. Each psel is then inserted into an array. A random number is generated between 0 and the total fitness factor. Starting at the beginning of the array each element is added up until this sum is greater than the random number. The last element of the array to be added to the sum is then selected for breeding. This approach has the advantage of still allowing lower fitness level candidates to reproduce, just at a lower rate, which helps keep a population's genetic diversity. Other methods may have higher pressures which just annihilate lower preforming individuals. Methods that have such a high selective pressure that they completely remove diversity too quickly can have problems getting stuck at a local 5 optimum. A local optimum is a point at which small changes will not improve the fitness of individuals, but there may be distant genes that lead to higher fitness levels. This is different from the overall optimum, which is the highest fitness level achievable. The problem comes in when there are several local optimums. Removing the diversity leaves the generations stuck in one specific local optimum, only passing on very similar genes. Deterministic sampling reworks the mating population. The average fitness is determined by summing all of the fitness levels together and dividing by the total number of individuals in the population. Then each individual's fitness level is divided by the average fitness of the group. The whole number that is produced is the number of spots in the mating pool that the individual will occupy. If there are any spots left in the mating pool after each individuals spots have been computed, then the decimal values of each individual's computation are sorted and the rest of the spots are filled starting with the highest value. After the mating population has been determined, random numbers are generated which point to the index of each selected individual. Since this is a comparison between the individual's fitness level and the average fitness level, it gives each individual a proportionate number of spots in the gene pool and lets fit members procreate more often. Stochastic remainder sampling with replacement is a combination of deterministic sampling and roulette wheel selection. It begins by using the deterministic sampling method to fill spots in the potential gene pool with whole number fitness-to-average-fitness members. It, however, treats the left over spots differently. It uses the remainder of each individual's fitness-to-average-fitness ratio to fill an array using the methods of roulette wheel selection. The left over spots in the gene pool are then filled from the remainder array using same methods applied in the roulette wheel selection method. Stochastic remainder sampling without replacement uses the same initial method to fill spots in the gene pool that deterministic sampling and stochastic remainder sampling with replacement use, but has yet a different way of filling left over spots. It takes the remainders from the fitness-to-average-fitness ratio and steps through each one performing a weighted-coin toss to determine if it will be selected. Each remainder is turned into a percentage by multiplying it by 100. A random number is generated and if the number is lower than the remainder's percentage, its genes are selected for the next open spot. The iteration continues until all spots are filled [6]. Ranking Selection 6 Ranking selection methods use an approach which gives individuals a chance to breed based on order with no regard to proportion. They are generally easier to implement and understand, but come at a cost of possibly being less accurate or less efficient. They tend to phase out diversity too quickly. Due to their problems, they are generally not used other than for instructional purposes. Ranking selection contains both linear ranking selection and truncate selection [6]. Linear ranking selection starts by sorting all candidates in order from highest to lowest fitness. A rank is then given to each candidate. Predefined selection probabilities are determined based on the number of individuals in the population without respect to individual fitness levels. The probabilities are then assigned to each candidate based on rank, with the highest rank getting the highest probability all the way down to the lowest rank getting the lowest probability. Then the individuals are selected for breeding based on their assigned probabilities [6]. Truncate selection uses a similar ranking system. Candidates are sorted in order from highest to lowest fitness. From here a predefined percentage of candidates will be chosen for reproduction. A common example would be choosing the top half of a group and reproducing each one with two other individuals. This will fill out a new generation, but will lead to a rapid decline in genetic diversity as half of the gene pool is dying off every generation [6]. Tournament Selection Tournament selection is characterized by pitting individuals against each other in sometimes randomly selected brackets to determine which ones get to reproduce. There are no probabilities assigned. It is all just determined by if the individual is selected and if it is more fit than its opponents. One of the benefits of tournament selection is that it only needs local information. It only needs to know the fitness of the selected individuals. It doesn't need to find a total sum or average fitness factor of all possible candidates. This makes it a good candidate for any problem where it might be implausible or inefficient to calculate totals. The categories of tournament selection includes: binary tournament selection, larger tournament selection, Boltzmann tournament selection, and correlative tournament selection [6]. Binary tournament selection is the simplest version of tournament selection. Two candidates are randomly selected out of the pool using a random number generator. Between the two candidates, whichever one has the highest fitness level is chosen to reproduce. This method is fast, efficient, and easy to implement [6]. 7 Larger tournament selection uses the same methodology as binary tournament selection except more than two individuals compete against each other. There is no difference other than the number of competitors [6]. Boltzmann tournament selection allows the selective pressure to be changed. It uses a variable, which can be changed over time, to increase or decrease the affect that the fitness factor has on the selection. When the variable, called its “temperature”, decreases it forces the selection method to pick more fit candidates, and when it increases it allows more of a random selection process. This adds a great deal of range to the application of the method. Correlative tournament selection is an extension of the regular binary and larger tournament selection methods. It isn't as much of a different selection method as much as a gene pool pairing mechanism. Once the pool of parents has been computed, this process pairs the parents based on how closely they are related. This fleshes out the possible advantages that the two parents may already have in common [6]. Gender-specific selection Gender-specific selection copies the foundation of evolution through sexual selection rather than natural selection. Sexual selection occurs in nature when the ability to mate for one sex is determined by the desires of the other sex. Gender-specific selection deals more with differences in two selected parents rather than each parent's ability to survive. It allows for two different approaches for selection to be utilized at the same time. Gender-specific selection methods include: genetic algorithm with chromosome differentiation, restricted mating, and correlative family-based selection [6]. Genetic algorithms with chromosome differentiation(GACD) have a wholly different approach to selection. One reason sexual selection is such a powerful tool is that it forces some genetic variation within populations. Populations without two sets of genes going into their offspring tend to become homogenous very quickly. So GACD's differentiate between a “male” class and a “female” class [1] [6]. To use this method, during encoding an extra two bits are attached to each individual's string called the class bits. These bits can have the value of either 00, representing a female, or 01, representing a male. In this way every candidate can be classified as either a male or female. An example of a string used in 8 GACD might be 0111001010, where the initial 01 is the group of class bits and the following 11001010 is the group of data bits which make up the standard trait representation [1]. The males, consisting of half of the population to begin, are generated first randomly as usual. The females are then created by maximizing the hamming distance between each male and themselves. The hamming distance is the sum of the differences between each bit of the given male and the female. For example, given the male string of 0111001010, the female with the highest hamming distance would be 0000110101. In this way, GACD begins its initialization by creating two opposing groups of males and females which have nothing in common [1]. The pool of potential parents can then either be selected by applying two different selection mechanics to the two classes or just by lumping them together and using one selection mechanic. These can be decided on a case by case basis. Once the pool of parents is created, the males and females are mated twice and removed from the pool until the pool either runs out of males or females or both. If the pool runs out of either males or females, but not both, then the leftover males or females are mated with the most fit individual of the opposite sex [1] [6]. The data string of each child is produced through the normal processes in crossover. The sex of the child, however, is selected by mimicking the natural process of sex selection. One of the class bits from each parent is taken and added to the child to create its new class bits. The female parent can only contribute a zero because both of her class bits are zero. The male parent then either contributes its first class bit to create a new female or its second class bit to create a new male with a class bit of 01. The class bits are both selected at random. Since each couple produces two offspring and the selected class bits are replaced when used, a couple may produce two males, two females or one male and one female [6]. Restricted mating uses the concept of the classification of species in its selection technique. In the natural world, different species are defined as animals that cannot or do not mate with each other. An example would be the household dog. Dogs may vary in appearance and temperament quite a bit, but any two dogs should be able to produce viable offspring. Because of this, all dogs are a part of the same genus and species, Canis familiaris [7]. In the same way, restricted mating classifies each individual into a specific class. The class is determined by a certain predefined trait, or subset of the binary representation of the individual. The different classes are then only allowed to mate within their own “species”. For example, if the first three bits of a 12-bit string are designated the species trait, then a given individual can only mate with another individual whose first three bits match their own. A candidate with a string of 100110011101 may mate with 100000101100, but not 000000101100. In this way, the algorithm is able to keep several separate variations evolving at the same time without worrying about them mixing into one 9 local optimum. Correlative family-based selection is used to maintain population diversity. Two parents are mated together twice producing two offspring. The two parents and the two offspring are considered the family. Firstly, among the family, the individual with the highest fitness level is chosen to go on to the next generation. This makes it so that if one of the parents has a better fitness level, then they are passed on instead of an inferior offspring. Secondly, the hamming distance is calculated between each part of the family and the other three members. The individual with the greatest hamming distance is also passed on to the next generation. This allows both the most fit member and the member with the greatest diversity compared to the average to be passed on [6]. Genetic Relatedness Based Selection Genetic relatedness based selections are created based around the idea of each individual remembering how it relates to other individuals. It does this by recognizing its closest ancestor in common with other individuals. It then uses this in a method to interact with its closest living relative. Genetic relatedness based selections include: fitness uniform selection scheme and reserve selection. Fitness uniform selection scheme(FUSS) is based on the principle of searching areas of the search space that haven't been check as much as others. FUSS is not concerned with reproducing the candidates with the highest fitness values; it is concerned with reproducing candidates that have little exploration done around their search space [3] [4]. The fitness values of each candidate are still calculated and then they are lumped into fitness levels. This organization puts similar individuals into their own groups based on fitness level. Then a random number is generated between the lowest and the highest fitness value. From here, the candidate with the closest fitness value to the random number is chosen to reproduce. This process tends to favor the candidates with the fitness values farthest away from other candidates because when they are set closer together there is a smaller distance between them, which means a smaller range that a random number could land in to choose them. Each chosen candidate then gets put into the gene pool and reproduces [3] [4]. This method helps to avoid getting stuck in a local optimum because the individuals have a tendency to diverge instead of converge. It constantly works to increase diversity and reach new areas of the search space. 10 Reserve selection has a purpose of not getting stuck in local optimums also. The reserve selection method has two categories: reserved and non-reserved. Non-reserved candidates tend to have better fitness scores and are subjected to the normal selection methods. Reserved candidates are generally low fit individuals that are carried over to new generations, but can also be selected with replacement to create offspring. Purposely keeping low fit individuals in the gene pool may seem counter-intuitive, but the purpose is to keep diversity in the pool to avoid getting stuck in local optimums. While a truncate selection remove low performers from the pool and get stuck in the local optimum, a reserve selection will always have a way to get out [6]. Elitism Elitism is not a total selection method in itself. It is an extension that can be attached to any other selection method. The purpose of elitism is to make sure that the most fit candidate does not die off due to being unlucky versus probability. Even if the individual with the highest fitness has the highest chance of reproduction, it still may get unlucky and not make it into the gene pool. To bypass this, when elitism is used the candidate with the highest fitness level is automatically carried over to the next generation. Automatically carrying the top candidate over to the next generation assures that progress will never be lost by an unlucky generation. This is an advantage because it makes sure ground is never lost in the search. This can also be a disadvantage, though, because it can make the process much harder or even impossible to get out of a local optimum [2] [6]. Crossover Crossover is the process where the combination of genes actually takes place. In nature, crossover happens when it is determined which parent a child receives each gene from. In the genetic algorithm usage, it is where one or more points, called locus, in the genetic string are determined and then used to combine genes. The locus is randomly generated with a value between one and the length of the genetic bit string minus one. The locus divides the string into two separate strings. The string to the right is replaced by the string in the equivalent position from the other parent. This creates the genetic string that makes up the offspring of the two [2]. 11 Using the equation in Example 1 and the data in Columns 1-3, a top half truncate selection will be run to find the reproducing parents. The list of reproducing parents is given in Column 4. Given that a truncate selection was used, each individual selected will reproduce twice each with two separate partners. The first parent will reproduce with the second parent, the second parent will also reproduce with the third parent, the third parent will also reproduce with the forth parent, and the forth parent will also reproduce with the first parent. Generating a random number between one and seven, the count of bits in the string minus one, returns three. This means for the first cross over that will be made the locus is at position three. Each parent locus will separated in half immediately following the third bit and crossed over with the opposite string from the other parent. In this example, the first parent's bit string is divided into 011/11101 and then crossed over with the second parent's divided bit string of 100/00110 to form the first offspring of 01100110 and the second offspring of 10011101. This process is then repeated with the different combinations of parents to produce all of the offspring to fill the next population in Column 6 using the randomly generated numbers for each crossover in Column 5. Mutation Mutation is the next step in the process. In the natural world, evolution wouldn't work without mutation. Mutation has to happen to bring the initial diversity into a population. Without mutation the first life form on Earth would not have evolved into anything else. Mutations are sometimes added during replication errors. If the mutation gives an advantage to the organism in some aspect, it may have a greater chance of reproducing and passing on its new mutated genes. Genetic algorithms purposely inject mutations into genes to artificially replicate the natural process. Each bit that is created during crossover is then checked against a certain probability to see if it should be mutated. Mutation probabilities are in general purposely very low. A common mutation rate may be around 1/1000. This means every bit is checked with a random number generated against the mutation rate to determine if the bit should be flipped. If the mutation check is successful, then the bit is XORed with one. The mutation process, therefore, injects diversity into the population. The regular injection helps to avoid retaining a homogeneous population. A homogenized population ruins the efficiency of the algorithm so mutation is a necessary part. After mutation has completed, the algorithm is left with a fully functioning new generation of 12 individuals. If the algorithm is fitness threshold based, the solutions are checked to see if any of them have a fitness level over the given threshold. If one does, the algorithm will end, having found the answer. If the algorithm is generation based, it will check to see if it has reached the specified number of generations. If not, it will add one to the generation counter and start back at the selection phase for the new generation [2]. Conclusion Genetic algorithms have a wide variety of uses due to how flexible they are. Their ability to be tweaked in different ways to allow for different situations is what makes them so powerful. At their core they use the fundamental principles of the force that brought life to what it is today from single cell organisms. This flexibility has seen genetic algorithms put to use in such wide-ranging implementations as image processing, optimal water network layouts, facial recognition, robotics, trajectories for spacecraft [2]. Genetic algorithms are an effective solution to large-scale optimization problems. References [1] Bandyopadhyay, S., Pal, S. K., & Maulik, U. (1998). Incorporating chromosome differentiation in genetic algorithms. Information Sciences, Retrieved from http://www.isical.ac.in/~sanghami/bandyopa_ieeetgrs.pdf.gz [2] Coley, D. (2003). An Introduction to Genetic Algorithms for Scientists and Engineers. Singapore: World Scientific Publishing Co. Pte. Ltd. [3] Hutter, M. (2001). Fitness Uniform Selection to Preserve Genetic Diversity. Retrieved from http://arxiv.org/pdf/cs.AI/0103015.pdf [4] Hutter, M. (2004). Tournament versus fitness uniform selection. Retrieved from http://arxiv.org/pdf/cs.LG/0403038.pdf [5] Pragasen, P., Nolan , R., & Towhidul , H. (1997). Application of genetic algorithms to motor parameter determination for transient torque calculations. IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 33(5), 1275. Retrieved from http://spectrum.library.concordia.ca/6537/1/Pillay_application_genetic.pdf [6] Sivaraj, R. (2011). A Review of Selection Methods In Genetic Algorithm. International Journal of Engineering Science and Technology, 5(3), Retrieved from http://www.ijest.info/docs/IJEST11- 13 03-05-190.pdf [7] Swaminathan, N. (n.d.). Why are different breeds of dogs all considered the same species? . Retrieved from http://www.scientificamerican.com/article.cfm?id=different-dog-breeds-samespecies

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Genetic Relatedness Based Selection