Download tutorial4_scoringMatices

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Ubiquitin wikipedia , lookup

Protein design wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein folding wikipedia , lookup

Protein domain wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

List of types of proteins wikipedia , lookup

Homology modeling wikipedia , lookup

Protein wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein structure prediction wikipedia , lookup

Protein purification wikipedia , lookup

Proteomics wikipedia , lookup

Structural alignment wikipedia , lookup

Cyclol wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Western blot wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Transcript
Tutorial 4
Comparing Protein Sequences
Intro to Bioinformatics
1
Amino acids were not born
equally
2
Comparing Protein Sequences
 Substitution Matrices
 PAM
- Point Accepted Mutations
 BLOSUM - Blocks Substitution Matrix
 Advance comparison tools
 Psi-BLAST
 Phi-BLAST
3
Substitution Matrix
 Scoring matrix S
 20x20 for protein alignment (Amino-acid)
 Si,j represents the gain/penalty due to substituting AAj by AAi (i –
line , j – colomn)
 Based on likelihood this substitution is found in nature
 Computed differently in PAM and BLOSUM
4
Computing probability of Mutation (Mi,j)
 PAM -
Point Accepted Mutations
 Based on closely related proteins (X%
divergence)
 Matrices for comparison of divergent
proteins computed
 BLOSUM -
Blocks Substitution Matrix
 Based on conserved blocks bounded in
similarity (at least X% identical)
 Matrices for divergent proteins are
derived using appropriate X%
5
PAM-1
 Captures mutation rates between close
proteins
 1% divergence
 Mi,j = AB / #A
 Problematic when comparing far proteins
 The 1% divergence does not capture more
sporadic mutations
 PAM250 is theoretical (extrapolation
based)
6
PAM-1
7
BLOSUM62
 Captures mutation rates between
divergent proteins
 Why is BLOSUM62 called BLOSUM62?
Basically, this is because all blocks
whose members shared at least 62%
identity with ANY other member of that
block were averaged and represented as 1
sequence.
8
BLOSUM62
The idea of BLOSUM matrices is to get a better measure of
differences between two proteins specifically for more distantly
related proteins.
Similar AA have high score
9
PAM & BLOSUM
PAM
BLOSUM
Based on local alignments.
Based on global alignments
of closely related proteins.
10
The PAM1 is calculated from
comparisons of sequences
with no more than 1%
divergence.
BLOSUM 62 is calculated from
comparisons of sequences
with at least 62% identity
in the blocks.
Other PAM matrices are
extrapolated from PAM1.
All BLOSUM matrices are
based on observed
alignments.
They are not extrapolated
from comparisons of closely
related proteins.
Use Recommendations
PAM100
PAM120
PAM160
PAM200
PAM250
~
~
~
~
~
BLOSUM90 Closely Related
BLOSUM80
BLOSUM60
BLOSUM52
BLOSUM45 Highly Divergent
Query length Matrix
<35
PAM30
11
35-50
50-85
>85
Gap costs
9,1
PAM70
10,1
BLOSUM80 10,1
BLOSUM62 11,1
Example
 Query: >ADRM1_HUMAN
(Proteasomal ubiquitin receptor)
 Data Base: nr on Human genome.
 Blast Program: BLASTP
 Matrices: PAM30,BLOSUM45
12
What difference do we observe?
•With BLOSUM45 we found related and divergent sequences.
•With PAM30 we found only related sequences.
PAM 30
13
BLOSUM45
With BLOSUM45 we can discover interesting relations
between proteins
PAM 30
BLOSUM45
.
.
.
14
Mucin-13:a
glycosylated
membrane
protein that
protects the
cell by
binding to
pathogens
Using different scoring matrices can produce slightly
Different alignments:
With PAM 30
With BLOSUM45
15
A same alignment can be solved in many ways, specially when
using a matrix for highly divergent sequences (BLOSUM45):
16
PSI-BLAST
Position Specific Iterative BLAST
We will analyze the following Archeal
uncharacterized protein:
>gi|2501594|sp|Q57997|Y577_METJA PROTEIN
MJ0577
MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVI
DEREIKKRDIFSLLLGVAGLNKSVEEFENELKNKLTEEAKNK
MENIKKELEDVGFKVKDIIVVGIPHEEIVKIAEDEGVDIIIM
GSHGKTNLKEILLGSVTENVIKKSNKPVLVVKRKNS
17
18
Threshold for
initial BLAST
Search
(default:10)
Threshold for
inclusion in
PSI-BLAST
iterations
(default:0.005)
19
The
query
itself
Orthologous
sequences
in two
other
archaeal
species
Other
homologous
sequences
20
21
.
.
.
.
.
.
.
.
.
22
Is MJ0577 a
filament
protein?
Is MJ0577 a
cationic
amino
transporter?
Is MJ0577 a
universal
stress
protein?