Download Sequence Alignment - bio-bio-1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sequence Alignment
 Sequence
alignment is the procedure of
comparing two (pair-wise alignment) or more
multiple sequences by searching for a series of
individual characters or patterns that are in the
same order in the sequences.
 A way of arranging the sequences of DNA,
RNA, or protein to identify Regions of
Similarity
 Types of alignment:
- Local alignment
- Global alignment
Global Alignment
In global alignment, an attempt is made to align the
entire sequence. If two sequences have
approximately the same length and are quite
similar, they are suitable for the global alignment.
L G P S S K Q T G K G S - S R I W
D N
L N - I T K S A G K G A I M R L G
D A
Local Alignment
Local alignment concentrates on finding stretches of
sequences with high level of matches.
- - - - - - - T G K G - - - - - - -
- - - - - - - A G K G - - - - - - -
Sequence Interpretation
• Sequence
alignment is useful for
discovering
structural, functional and evolutionary information.
• Sequences that are very much alike may have similar
secondary and 3D structure, similar function and likely
a common ancestral sequence.
• Large scale genome studies revealed
existence of
horizontal transfer of genes and other sequences
between species, which may cause similarity between
some sequences in very distant species.
Sequence Alignment Method
• Dot matrix analysis
• The dynamic programming
Dot Matrix Analysis
• A dot matrix analysis is a method for comparing two sequences
to look for possible alignment (Gibbs and McIntyre 1970)
• One sequence (A) is listed across the top of the matrix and the
other (B) is listed down the left side
• Starting from the first character in B, one moves across the page
keeping in the first row and placing a dot in many column where
the character in A is the same
• The process is continued until all possible comparisons between
A and B are made
• Any region of similarity is revealed by a diagonal row of dots
• Isolated dots not on diagonal represent random matches
Dot Matrix Analysis
• Detection of matching regions can be improved by
filtering out random matches and this can be
achieved by using a sliding window
• It means that instead of comparing a single
sequence position more positions is compared at the
same time and dot is printed only if a certain
minimal number of matches occur
• Dot matrix analysis can also be used to find direct
and inverted repeats within the sequences
Dot Matrix Analysis
G
A
A
C
•
C
A
•
•
C
•
•
G
•
G
•
C
A
•
T
A
T
•
•
•
•
•
Dot Matrix Analysis
Two similar sequences
Two very different sequences
Dynamic Programming
• The method compares every pair of characters in the
two sequences and generates an alignment, which is
the best or optimal.
• This is a highly computationally demanding method.
• Each alignments has its own score and it is essential
to recognise that several different alignments may have
nearly identical scores, which is an indication that the
dynamic programming methods may produce more
than one optimal alignment.
• Global alignment program is based on Needleman-
Wunsch algorithm and local alignment on SmithWaterman. Both algorithms are derivates from the
basic dynamic programming algorithm.
Needleman-Wunsch Method
 It quantifies the similarity between two sequences.
 Any measurement of similarity must be done with respect




to the best possible alignment between two sequences
If good matches are found, the search results in a high
scoring segment pairs.
Over the course of evolution, some positions of base or
amino acid in a sequence undergo
- Substitutions
- Insertions
- Deletions
Insertion/deletion are less common as compared to
substitutions.
Gaps are penalized more heavily than mismatches when
calculating a similarity score.
Related documents