* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT
Molecular cloning wikipedia , lookup
Gel electrophoresis of nucleic acids wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Network motif wikipedia , lookup
Community fingerprinting wikipedia , lookup
Non-coding DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genetic engineering wikipedia , lookup
Molecular evolution wikipedia , lookup
Molecular ecology wikipedia , lookup
Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003 Outline • • • • • • Problem Statement Current Techniques GA Motivation My Algorithm Results Extension Possibilities Problem Statement The problem is to search and align strands of DNA using a genetic algorithm. Current Techniques • Approximate string matching – Usually meant for smaller strings – Many are set up for k mismatches • 2 DNA strands of size 90 and 85 – Allowing for 5 gaps in the second strand gives almost 44 million possible alignments Current Techniques (cont.) • Needleman-Wunsch – Gap penalty -1 – Match bonus +1 – Mismatch 0 • Not practical if the sequence starts in the middle – Counts the gaps at the beginning and end as penalties. Current Techniques (cont.) • BLAST (Basic Local Alignment Search Tool) and FASTA – Use domain specific knowledge • http://www.ncbi.nlm.nih.gov/BLAST • http://fasta.bioch.virginia.edu GA Motivation • Alien DNA • Junk DNA • Extendable to similar text searches without domain specific knowledge My Algorithm • The population – Bit strings of 0’s and 1’s – 0’s are spaces, 1’s mean a letter is placed there – The number of 1’s stays constant as the number of letters in the smaller search string My Algorithm (cont.) • Breeding – Rank based selection • Crossover – The common place markers are kept the same – The rest of the place markers are split evenly between the two children My Algorithm (cont.) • Mutation – If the amount of gaps is less than one tenth of the small string size add a gap – Otherwise delete a gap Results • The target match Results (cont.) • Ran for 50 generations • Different random numbers for the same number of generations give best fitness values between about 32 and 67 (optimal fitness - 90) Extension Possibilities • Better representation of population • Be able to alter fitness evaluation to be more specific to different problems • Ability to add domain specific knowledge • Parallel searching Questions?