Download Identifying sequences with

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Identifying sequences with …
Speaker :
S. Gaj
BioInformatics Lunch Meeting
Date 04-03-2005
Annotation
Annotation
Background
•
Best possible description available for a given
sequence at the current time.
How to annotate?
•
Combining
•
•
•
Alignment Tools
Databases
Datamining (scripts)
Microarrays
Part I:
Sequence Alignment
Introduction
Global alignment
Background
•
Optimal alignment between two sequences
containing as many characters of the query as
possible.
Ex: predicting evolutionary relationship between genes, …
Local alignment
•
Optimal specific alignment between two sequences
identifying identical area(s)
Ex: Identifying key molecular structures (S-bonds, ahelices, …)
Global vs Local Alignment
Global Alignment:
-42
Score:
at (seq1)[1..90] : (seq2)[1..90]
1 MA-----STVTSCLEPTEVFMDLWPEDHSNWQELSPLEPSDPLNPPTPPRAAPSPVVPST
|
||
|
|
|
|
| |
| |
1 MSHGIQMSTIKKRRSTDEEVFCLPIKGREIYEILVKIYQIENYNMECAPPAGASSVSVGA
• Includes total sequence
• The highest score
56 EDYGGDFDFRVGFVEAGTAKSVTCTYSPVLNKVYC
|
|
|
61 TEAEPTEVFMDLWPEDHSNWQELSPLEPSD-----
Local Alignment:
Score:
148 at (seq1)[10..36] : (seq2)[64..90]
10 EPTEVFMDLWPEDHSNWQELSPLEPSD
|||||||||||||||||||||||||||
64 EPTEVFMDLWPEDHSNWQELSPLEPSD
• The highest score
• Stop the alignment extension
if it is not profitable
BLAST
Introduction
Basic Local Alignment Search Tool
•
Aligning an unknown sequence (query) against all
sequences present in a chosen database based on a
score-value.
•
Aim :
Obtaining structural or functional information on the
unknown sequence.
Programs
BLAST
•
•
Different BLAST programs available
Protein
Nucleic
BlastN
BlastX
Protein
-
BlastP
Parameters:
•
•
Nucleic
Maximum E-Value, Gap Opening Penalty (GOP), Gap Extension
Penalty (GEP), …
Terms
•
•
•
Query
Subject
Hit
Sequence which will be aligned
Sequence present in database
Alignment result.
BLAST: Matrices
Substition Matrices – What?

Estimates the rate at which each possible residue in a
sequence changes to each other residue over time.

For example, hydrophobic residue is more likely to
stay hydrophobic than not.

Each matrix is tailored to look for certain types of
sequences – KNOW WHAT YOU ARE LOOKING FOR!
BLAST: Matrices
Substition Matrices – Why?
1.
Determine likelihood of homology between two
sequences
2.
Substitutions that are more likely should get a higher
score
3.
Substitutions that are less likely should get a lower
score.
Matrices - PAM
•
BLAST: Matrices
•
Point Accepted Mutation
Mostly used in global amino acid alignments
•
PAM1 represents 1% of change
PAM250 = (PAM1)^250
•
PAM1
•
Applied for a time period over which we expect 1% of the amino acids to
undergo accepted point mutations within the species of interest.
BLOSUM
•
BLAST: Matrices
•
•
•
Mostly used in local AA alignments
Based on observed alignments, not predicted ones.
BLOSUM 80, BLOSUM 62, BLOSUM 45
Default: BLOSUM 62
Matrix calculated from comparisons of sequences with no
less than 62% divergence.
PAM vs BLOSUM
•
Closely related:
BLAST: Matrices
•
•
•
High PAM
Low BLOSUM
Distantly related:
•
•
Low PAM
High BLOSUM
BLAST
BlastN Example
BLAST
BlastN Example
Common BLAST problems
•
BlastN
BLAST
C GA T A C GC C A GG - A T A T A C C
| | | | | | | | | | | |
| | | | | | |
C GA T A C GC C A GGGA T A T A C C
Sequencing Error
•
Solution:
Low penalty for GOP and GEP = 1
Clone seq
mRNA
Translation Problems
•
6-Frame translation
BLAST
>embl|J03801|HSLSZ Human lysozyme mRNA, complete cds with an
Alu repeat in the 3' flank.
+1
L
A
L
*
P
S
S
Q H
E
G
S H
C S G
A
ctagcactctgacctagcagtcaacatgaaggctctcattgttctggggct...
Translation Problems
•
6-Frame translation
>embl|J03801|HSLSZ Human lysozyme mRNA, complete cds with an
Alu repeat in the 3' flank.
BLAST
+3
S
+2
+1
0
*
L
T
H
A
L
T
S
*
D
L
*
Q
L
P
A
S
V
S
S
T
*
N
R
M
Q H
L
K
E
A
G
S
L
L
I
S H
F W
V
L
G
G
C S
G
A
ctagcactctgacctagcagtcaacatgaaggctctcattgttctggggct...
V
L
C
Q
C
V
S
K
A
K
L
A
*
F
P
L
L
N
E
*
I
R
C
N
L
V
M
F
Y
C
F L
I
V
F Y
F
F
H H
I
A
M
T S C
*
H
-1
-2
I
-3
http://searchlauncher.bcm.tmc.edu/cgi-bin/seq-util/sixframe.pl
Common BLAST problems
intron
exon
BLAST
Gene X
Translation
full mRNA
Splicing
mRNA
Common BLAST problems
Coding region
Non-coding region
BLAST
mRNA
Clones derived
from mRNA
BlastX against
protein sequence
3 possible hit-situations
Common BLAST problems
Coding region
Non-coding region
JO K E R
 Yields no protein hit
B A T MA N R O B I N
BLAST
B A T MA N R O B I N
| | | | | | | | | | |
B A T MA N R O B I N
or
JO K E R B A T M A N
| | | | | |
B A T MA N R O B I N
 Aligns with protein in
1 of the 6 frames.
 Part perfect alignment
End
Questions?
Related documents