Download Lecture No - Taleem-E

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Non-coding DNA wikipedia , lookup

Molecular evolution wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Biosynthesis wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Point mutation wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Transcript
CS502-Fundamentals of Algorithms
Lecture No.17
Lecture No.17
6.2 Dynamic Programming
Dynamic programming is essentially recursion without repetition. Developing a dynamic
programming algorithm generally involves two separate steps:
• Formulate problem recursively. Write down a formula for the whole problem as a
simple combination of answers to smaller subproblems.
• Build solution to recurrence from bottom up. Write an algorithm that starts with base
cases and works its way up to the final solution.
Dynamic programming algorithms need to store the results of intermediate subproblems.
This is often but not always done with some kind of table. We will now cover a number
of examples of problems in which the solution is based on dynamic programming
strategy.
6.3 Edit Distance
The words “computer” and “commuter” are very similar, and a change of just one letter,
p-¿m, will change the first word into the second. The word “sport” can be changed into
“sort” by the deletion of the ‘p’, or equivalently, ‘sort’ can be changed into ‘sport’ by the
insertion of ‘p’. The edit distance of two strings, s1 and s2, is defined as the minimum
number of point mutations required to change s1 into s2, where a point mutation is one
of:
• change a letter,
• insert a letter or
• delete a letter
For example, the edit distance between FOOD and MONEY is at most four:
6.3.1 Edit Distance: Applications
There are numerous applications of the Edit Distance algorithm. Here are some
examples:
Spelling Correction
If a text contains a word that is not in the dictionary, a ‘close’ word, i.e. one with a small
edit distance, may be suggested as a correction. Most word processing applications, such
as Microsoft Word, have spelling checking and correction facility. When Word, for
example, finds an incorrectly spelled word, it makes suggestions of possible
replacements.
Plagiarism Detection
If someone copies, say, a C program and makes a few changes here and there, for
example, change variable names, add a comment of two, the edit distance between the
Page 1
© Copyright Virtual University of Pakistan
of 2
CS502-Fundamentals of Algorithms
Lecture No.17
source and copy may be small. The edit distance provides an indication of similarity that
might be too close in some situations.
Computational Molecular Biology DNA is a polymer. The monomer units of DNA are
nucleotides, and the polymer is known as a “polynucleotide.” Each nucleotide consists of
a 5-carbon sugar (deoxyribose), a nitrogen containing base attached to the sugar, and a
phosphate group. There are four different types of nucleotides found in DNA, differing
only in the nitrogenous base. The four nucleotides are given one letter abbreviations as
shorthand for the four bases.
• A-adenine
• G-guanine
• C-cytosine
• T-thymine
Double-helix of DNA molecule with nucleotides Figure of Double-helix of DNA
molecule with nucleotides goes here The edit distance like algorithms are used to
compute a distance between DNA sequences (strings over A,C,G,T, or protein sequences
(over an alphabet of 20 amino acids), for various purposes, e.g.:
• to find genes or proteins that may have shared functions or properties
• to infer family relationships and evolutionary trees over different organisms.
Speech Recognition
Algorithms similar to those for the edit-distance problem are used in some speech
recognition systems.
Find a close match between a new utterance and one in a library of classified utterances.
6.3.2 Edit Distance Algorithm
A better way to display this editing process is to place the words above the other:
The first word has a gap for every insertion (I) and the second word has a gap for every
deletion (D). Columns with two different characters correspond to substitutions (S).
Matches (M) do not count. The Edit transcript is defined as a string over the alphabet M,
S, I, D that describes a transformation of one string into another. For example
S
1+
D
1+
I
1+
M
0+
D
1+
M
0+ = 4
In general, it is not easy to determine the optimal edit distance. For example, the distance
between ALGORITHM and ALTRUISTIC is at most 6.
Is this optimal?
Page 2
© Copyright Virtual University of Pakistan
of 2