Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Molecular cloning wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Network motif wikipedia , lookup

Replisome wikipedia , lookup

Community fingerprinting wikipedia , lookup

Non-coding DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genetic engineering wikipedia , lookup

Molecular evolution wikipedia , lookup

Molecular ecology wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Transcript
Using a Genetic Algorithm for
Approximate String Matching on
Genetic Code
Carrie Mantsch
December 5, 2003
Outline
•
•
•
•
•
•
Problem Statement
Current Techniques
GA Motivation
My Algorithm
Results
Extension Possibilities
Problem Statement
The problem is to search and align strands of DNA
using a genetic algorithm.
Current Techniques
• Approximate string matching
– Usually meant for smaller strings
– Many are set up for k mismatches
• 2 DNA strands of size 90 and 85
– Allowing for 5 gaps in the second strand gives
almost 44 million possible alignments
Current Techniques (cont.)
• Needleman-Wunsch
– Gap penalty -1
– Match bonus +1
– Mismatch 0
• Not practical if the sequence starts
in the middle
– Counts the gaps at the beginning and
end as penalties.
Current Techniques (cont.)
• BLAST (Basic Local Alignment Search
Tool) and FASTA
– Use domain specific knowledge
• http://www.ncbi.nlm.nih.gov/BLAST
• http://fasta.bioch.virginia.edu
GA Motivation
• Alien DNA
• Junk DNA
• Extendable to similar text
searches without domain
specific knowledge
My Algorithm
• The population
– Bit strings of 0’s and 1’s
– 0’s are spaces, 1’s mean a letter is placed there
– The number of 1’s stays constant as the number
of letters in the smaller search string
My Algorithm (cont.)
• Breeding
– Rank based selection
• Crossover
– The common place markers are kept the same
– The rest of the place markers are split evenly
between the two children
My Algorithm (cont.)
• Mutation
– If the amount of gaps is less than one tenth of
the small string size add a gap
– Otherwise delete a gap
Results
• The target match
Results (cont.)
• Ran for 50 generations
• Different random numbers for the same number of
generations give best fitness values between about
32 and 67 (optimal fitness - 90)
Extension Possibilities
• Better representation of population
• Be able to alter fitness evaluation to be
more specific to different problems
• Ability to add domain specific knowledge
• Parallel searching
Questions?