Download Protein Sequencing and Identification With Mass

Document related concepts

Matrix-assisted laser desorption/ionization wikipedia , lookup

Metabolomics wikipedia , lookup

Structural alignment wikipedia , lookup

Metalloprotein wikipedia , lookup

Proteolysis wikipedia , lookup

Peptide synthesis wikipedia , lookup

Mass spectrometry wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Transcript
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Protein Sequencing and
Identification With Mass
Spectrometry
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Outline
• Tandem Mass Spectrometry
• De Novo Peptide Sequencing
• Spectrum Graph
• Protein Identification via Database Search
• Identifying Post Translationally Modified Peptides
• Spectral Convolution
• Spectral Alignment
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Amino Acids vs. Nucleic Acids
Amino Acids:
Amine, Carboxylic Acid, R-group
Nucleic Acids:
Sugar, Phosphate, Base
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Protein Backbone
H...-HN-CH-CO-NH-CH-CO-NH-CH-CO-…OH
N-terminus
Ri-1
AA residuei-1
Ri
AA residuei
Ri+1
AA residuei+1
C-terminus
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Breaking of Protein Backbone
H+
H...-HN-CH-CO
N-terminus
Ri-1
AA residuei-1
NH-CH-CO-NH-CH-CO-…OH
Ri
AA residuei
Ri+1
AA residuei+1
C-terminus
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Breaking Peptides into Fragment Ions
• Proteases, e.g. trypsin, break protein into
peptides.
• A Tandem Mass Spectrometer further breaks
the peptides down into fragment ions and
measures the mass of each piece.
• Mass Spectrometer electrically accelerates the
fragmented ions; heavier ions accelerate slower
than lighter ones.
• Mass Spectrometers measure mass/charge
ratio of an ion.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Mass Spectrometry
Matrix-assisted Laser Desorption/Ionization
From lectures by Vineet Bafna (UCSD)
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Tandem Mass Spectrometry
e
c
n
a
d
n
u
b
A
e
v
i
t
a
l
e
R
S#: 1707 RT: 54.44 AV: 1 NL: 2.41E7
F: + c Full ms [ 300.00 - 2000.00]
RT: 0.01 - 80.02
100
90
80
638.0
100
1389
LC
1409
2149
1615 1621
1411
1387
60
50
1593
1995
1655
1435
1987
1445
1661
40
1307 1313
1105
1095
20
2155
e
c
n
a
d
n
u
b
A
95
e
v
i
t
a
l
e
R
70
MS
90
85
80
75
65
60
55
801.0
50
2001 2177
1937
1779
30
Base Peak F: +
c Full ms [
300.00 2000.00]
2147
1611
70
NL:
1.52E8
1991
45
40
2205
2135
2017
35
Scan 1707
638.9
30
25
2207
1707
2329
872.3
1275.3
15
687.6
10
2331
10
1173.8
20
944.7
783.3
1048.3
5
1212.0
1413.9
1617.7
1400
1600
1742.1
1884.5
0
200
0
5
10
15
20
25
30
35
40 45
Time (min)
50
55
60
65
70
75
400
600
800
1000
m/z
1200
1800
2000
80
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100
collision
MS-2
MS-1
cell
Ion
Source
e
c
n
a
d
n
u
b
A
95
e
v
i
t
a
l
e
R
70
687.3
90
85
588.1
80
75
MS/MS
65
60
55
851.4
425.0
50
45
949.4
40
326.0
35
524.9
30
25
20
589.2
226.9
1048.6
1049.6
397.1
489.1
15
10
629.0
5
0
200
400
600
800
1000
m/z
1200
Scan 1708
1400
1600
1800
2000
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Using Tandem Mass Spectrometry
S
e
q
u
e
n
c
e
MS/MS instrument
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100
e
c
n
a
d
n
u
b
A
95
e
v
i
t
a
l
e
R
70
687.3
90
85
588.1
80
75
65
60
55
851.4
425.0
50
45
949.4
40
326.0
35
Database search
•Sequest
de Novo interpretation
•Sherenga
524.9
30
25
20
589.2
226.9
1048.6
397.1
1049.6
489.1
15
10
629.0
5
0
200
400
600
800
1000
m/z
1200
1400
1600
1800
2000
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Tandem Mass Spectrum
• Tandem Mass Spectrometry (MS/MS): mainly
generates partial N- and C-terminal peptides
• Spectrum consists of different ion types
because peptides can be broken in several
places.
• Chemical noise often complicates the
spectrum.
• Represented in 2-D: mass/charge axis vs.
intensity axis
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Tandem Mass Spectrum: An Example
Secondary Fragmentation
Ionized parent peptide
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
rm
te
C-
N-
te
rm
in
ina
al
lp
pe
ep
pt
tid
id
es
es
N- and C-terminal Peptides
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Terminal peptides and ion types
Peptide
Mass (D)
Peptide
Mass (D)
57 + 97 + 147 + 114 = 415
without
57 + 97 + 147 + 114 – 18 = 397
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Peptide Fragmentation
b2-H2O
a2
b3- NH3
b2
a3
b3
HO
NH3+
|
|
R1 O
R2 O
R3 O
R4
|
||
|
||
|
||
|
H -- N --- C --- C --- N --- C --- C --- N --- C --- C --- N --- C -- COOH
|
|
|
|
|
|
|
H
H
H
H
H
H
H
y3
y2
y3 -H2O
y1
y2 - NH3
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
De novo Peptide Sequencing
S#: 1708 RT: 54.47 AV: 1 NL: 5.27E6
T: + c d Full ms2 638.00 [ 165.00 - 1925.00]
850.3
100
e
c
n
a
d
n
u
b
A
95
e
v
i
t
a
l
e
R
70
687.3
90
85
588.1
80
75
65
60
55
851.4
425.0
50
45
949.4
40
326.0
35
524.9
30
25
20
589.2
226.9
1048.6
1049.6
397.1
489.1
15
10
629.0
5
0
200
400
600
800
1000
m/z
1200
1400
Sequence
1600
1800
2000
An Introduction to Bioinformatics Algorithms
Theoretical Spectrum
www.bioalgorithms.info
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Theoretical Spectrum (cont
d)
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Theoretical Spectrum (cont
d)
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Building Spectrum Graph
• How to create vertices (from peaks)
• How to create edges (from mass differences)
• How to score paths
• How to find best path
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
b
S E Q U E N C E
Mass/Charge (M/Z)
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
a
SE
Q U
E
N
Mass/Charge (M/Z)
C
E
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
a is an ion type shift in b
S E
Q U E
Mass/Charge (M/Z)
N
C E
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
y
E C
N
E
U Q
Mass/Charge (M/Z)
E S
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
y with corresponding intensities
N
E
U Q
Intensity
E C
Mass/Charge (M/Z)
E S
Intensity
An Introduction to Bioinformatics Algorithms
Mass/Charge (M/Z)
www.bioalgorithms.info
Intensity
An Introduction to Bioinformatics Algorithms
Mass/Charge (M/Z)
www.bioalgorithms.info
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
noise
Mass/Charge (M/Z)
An Introduction to Bioinformatics Algorithms
Intensity
MS/MS Spectrum
Mass/Charge (M/z)
www.bioalgorithms.info
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Mass Differences Correspond to Amino Acids
u
q
s
e
s
e
e
c
e
u
q
e
n
n
q
u
e
n
c
c
e
e
s
e
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Ion Types
• Some peaks correspond to fragment ions,
others are just random noise
• Knowing ion types _={_1, _2,…, _k} lets us
distinguish fragment ions from noise
• We can learn ion types _i and their
probabilities qi by analyzing a large test
sample of annotated spectra.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Example of Ion Type
• _={_1, _2,…, _k}
• _={b, b-NH3, b-H2O}
• Corresponding values of _={0, 17, 18}
• *Note: In reality the _ value of ion type b is -1
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Peptide Sequencing Problem
Goal: Find a peptide with maximal match between
an experimental and theoretical spectrum.
Input:
• S: experimental spectrum
• _: set of possible ion types
• m: parent mass
Output:
• P: peptide with mass m, whose theoretical
spectrum matches the experimental S
spectrum the best
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Vertices
• Masses of potential N-terminal peptides
• Vertices are generated by reverse shift
• Every peak s in a spectrum generates
vertices
• V(s) = {s+_1, s+ _2, …, s+ _k}
An Introduction to Bioinformatics Algorithms
Vertices (cont
www.bioalgorithms.info
d)
• Vertices of the spectrum graph:
• {vinit}∪V(s1) ∪V(s2) ∪... ∪V(sm) ∪{vfin}
• Where _={_1, _2,…, _k} are ion types.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Reverse Shifts
Intensity
b/b-H2O+H2O
b-H2O
b+H2O
Red: Mass Spectrum
Blue: shift (+H2O)
Mass/Charge (M/Z)
• Two peaks b-H2O and b are given by the Mass
Spectrum
• With a +H2O shift, if two peaks coincide that is a
possible vertex.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Example of Reverse Shift
Shift in H2O
Shift in H2O and NH3
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Edges
• Two vertices with mass difference
corresponding to an amino acid A:
• Connect with an edge labeled by A
• Gap edges for di- and tri-peptides
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Paths
• Path in the graph corresponds to an amino
acid sequence
• There are many paths, how to find the correct
one?
• We need scoring to evaluate paths
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Path Score
• p(P,S) = probability that peptide P produces
spectrum S = {s1,s2,…sq}
• p(P, s) = the probability that peptide S
generates a peak s
• Scoring = computing probabilities
• p(P,S) = !s_S p(P, s)
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Peak Score
• For a position t that represents ion type dj :
qj, if peak is generated at t
p(P,st) =
1-qj , otherwise
An Introduction to Bioinformatics Algorithms
Peak Score (cont
www.bioalgorithms.info
d)
• For a position t that is not associated with an
ion type:
qR , if peak is generated at t
pR(P,st) =
1-qR , otherwise
• qR = the probability of a noisy peak that does
not correspond to any ion type
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Finding Optimal Paths in the Spectrum Graph
• For a given MS/MS spectrum S, find a
peptide P’ maximizing p(P,S) over all
possible peptides P:
p(P',S) = max P p(P,S)
• Peptides = paths in the spectrum graph
• P’ = the optimal path in the spectrum graph
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Ions and Probabilities
• Tandem mass spectrometry is characterized
by a set of ion types {•‰
1,•‰
2,..,•‰
k} and their
probabilities {q1,...,qk}
¶U•‰
i-ions of a partial peptide are produced
independently with probabilities qi
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Ions and Probabilities
• A peptide has all k peaks with probability
k
∏q
i
i =1
k
• and no peaks with probability ∏ (1 − qi )
i =1
• A peptide also produces a ``random noise''
with uniform probability qR in any position.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Ratio Test Scoring for Partial Peptides
• Incorporates premiums for observed ions
and penalties for missing ions.
• Example: for k=4, assume that for a partial
peptide P’ we only see ions •‰
1,•‰
2,•‰
4.
q1 q2 (1 − q3 ) q4
The score is calculated as:
⋅ ⋅
⋅
qR qR (1 − qR ) qR
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Scoring Peptides
• T- set of all positions.
• Ti={t _1,, t _2,..., ,t _k,}- set of positions that
represent ions of partial peptides Pi.
• A peak at position t_j is generated with
probability qj.
• R=T- U Ti - set of positions that are not
associated with any partial peptides (noise).
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Probabilistic Model
• For a position t _j ∈ Ti the probability p(t, P,S) that
peptide P produces a peak at position t.
 qj
P(t , P, S ) = 
1 − q j
if a peak is generated at position t δ j
otherwise
• Similarly, for t∈R, the probability that P produces a
random noise peak at t is:
 qR
PR (t ) = 
1 − qR
if a peak is generated at position t
otherwise
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Probabilistic Score
• For a peptide P with n amino acids, the score
for the whole peptides is expressed by the
following ratio test:
n
k p (t
p ( P, S )
iδ j , P , S )
= ∏∏
pR ( S )
pR (tiδ j )
i =1 j =1
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Role of de novo Interpretation
• Interpreting MS/MS of novel peptides
• Automatic validation of MS/MS database
matches.
• Leveraging homology matching across
species
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Post-Translational Modifications
Proteins are involved in cellular signaling and
metabolic regulation.
They are subject to a large number of biological
modifications.
Almost all protein sequences are posttranslationally modified and 200 types of
modifications of amino acid residues are
known.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Examples of Post-Translational
Modification
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Difficulties in Finding PostTranslational Modifications
Currently post-translational modifications
cannot be inferred from DNA sequences.
Finding post-translational modifications
remains an open problem even after the
human genome is completed.
Post-translational modifications increase the
number of “letters” in amino acid alphabet
and lead to a combinatorial explosion.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Sequencing of Modified Peptides
De novo peptide sequencing is invaluable for
identification of unknown proteins:
However, de novo algorithms are designed for
working with high quality spectra with good
fragmentation and without modifications.
Another approach is to compare a spectrum
against a set of known spectra in a database.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Functional Proteomics
• Problem: Given a large collection of
uninterpreted spectra, find out which spectra
correspond to similar peptides.
• A method that cross-correlates related
spectra (e.g., from normal and diseased
individuals) would be valuable in functional
proteomics.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Protein identification Problem
• Input: A database of proteins, an
experimental spectrum S, a set of ion types
_, and a parent mass m.
• Output: A peptide of mass m from the
database with the best match to spectrum
S.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
MS/MS Database Search
Database search in mass-spectrometry has been very
successful in identification of already known proteins.
Experimental spectrum can be compared with theoretical
spectra database peptides to find the best fit.
SEQUEST (Yates et al., 1995)
But reliable algorithms for identification of modified
peptides are not yet known.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Search for Modified Peptides:
Virtual Database Approach
Yates et al.,1995: an exhaustive search in a
virtual database of all modified peptides.
Exhaustive search leads to a large combinatorial
problem, even for a small set of modifications
types.
Problem (Yates et al.,1995). Extend the virtual
database approach to a large set of
modifications.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Modified Peptide Identification Problem
Input: Experimental spectrum S
Database of peptides
Parameter k (# of mutations/modifications)
A set of ion types _
Parent mass m
Output: a peptide with the best match to the
spectrum S that is at most k
mutations/modifications apart from a database
peptide.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Peptide Identification Problem: Challenge
Very similar peptides may have very different
spectra!
Goal: Define a notion of spectral similarity that
correlates well with the sequence similarity.
If peptides are a few mutations/modifications
apart, the spectral similarity between their
spectra should be high.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Deficiency of the Shared Peaks Count
Shared peaks count (SPC): intuitive measure
of spectral similarity.
Problem: SPC diminishes very quickly as the
number of mutations increases.
Only a small portion of correlations between
the spectra of mutated peptides is captured
by SPC.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
SPC Diminishes Quickly
no mutations
SPC=10
1 mutation
SPC=5
2 mutations
SPC=2
S(PRTEIN) = {98, 133, 246, 254, 355, 375, 476, 484, 597, 632}
S(PRTEYN) = {98, 133, 254, 296, 355, 425, 484, 526, 647, 682}
S(PGTEYN) = {98, 133, 155, 256, 296, 385, 425, 526, 548, 583}
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Convolution
S 2 − S1 = {s2 − s1:s1 ∈ S1,s2 ∈ S 2 }
Number of pairs s1 ∈ S1 , s2 ∈ S 2 with s2 − s1 = x :
( S 2 − S1 )( x)
The shared peaks count (SPC peak) :
( S 2 − S1 )(0)
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Elements of S2 S1 represented as elements of a difference matrix. The
elements with multiplicity >2 are colored; the elements with multiplicity =2
are circled. The SPC takes into account only the red entries
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Convolution: An Example
5
4
Spectral
Convolution
3
2
1
0
-150
150
-100
-50
0
x
50
100
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Comparison: Difficult Case
S = {10, 20, 30, 40, 50, 60, 70, 80, 90, 100}
Which of the spectra
S’ = {10, 20, 30, 40, 50, 55, 65, 75,85, 95}
or
S” = {10, 15, 30, 35, 50, 55, 70, 75, 90, 95}
fits the spectrum S the best?
SPC: both S’ and S” have 5 peaks in common with S.
Spectral Convolution: reveals the peaks at 0 and 5.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Comparison: Difficult Case
S
S’
S
S’’
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Limitations of the Spectrum Convolutions
Spectral convolution does not reveal that
spectra S and S’ are similar, while spectra S
and S” are not.
Clumps of shared peaks: the matching
positions in S’ come in clumps while the
matching positions in S” don't.
This important property was not captured by
spectral convolution and was overlooked in
the previous MS/MS algorithms.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Edit Distance Between Spectra
A = {a1 < … < an} : an ordered set of natural
numbers.
A shift Δi transforms
{a1, …., an}
Into
{a1, ….,ai-1,ai+Δi,…,an+ Δi }
e.g.
20 30 40 50 60 70 80 90
10 20 30 35 45 55 65 75 85
10 20 30 35 45 55 62 72 82
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Alignment Problem
• Find a series of k shifts that make the sets
A={a1, …., an} and B={b1,….,bn}
as similar as possible.
• k-similarity between sets
• D(k) - the maximum number of elements in
common between sets after k shifts.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Alignment vs. Sequence Alignment
• Manhattan-like graph with different alphabet
and scoring.
• Axes in the graph correspond to peaks in the
two spectra.
• In this case, score is 1 if the diagonal line
goes through a peak on both axes, 0
otherwise.
• Movement can be diagonal or perpendicular
(but only k times total).
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Alignment =
Sequence Alignment in 0-1 Alphabet
• Convert spectrum to a string with each
index being 1 if it corresponds to a peak
and 0 otherwise.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Product
A={a1, …., an} and B={b1,…., bn}
Spectral product A⊗B: two-dimensional matrix with
nm 1s corresponding to all pairs of10 20 30 40 50 55 65 75 85 95
indices (ai,bj) and remaining
elements being 0s.
SPC: the number of 1s at
the main diagonal.
δ-shifted SPC: the number
of 1s on the diagonal (i,i+ δ)
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
δ1
1 1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
1
1
1
1
1 1
1
1
1
1
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Alignment: k-similarity
k-similarity between spectra: the maximum number
of 1s on a path through this graph that uses at most
k+1 diagonals.
k-optimal spectral
alignment = a path.
The spectral alignment
allows one to detect
more and more subtle
similarities between
spectra by increasing k.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Use of k-Similarity
SPC reveals only
D(0)=3 matching
peaks.
Spectral Alignment
reveals more
hidden similarities
between spectra:
D(1)=5 and D(2)=8
and detects
corresponding
mutations.
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Black lines represent the paths for k=0
Red lines represent the paths for k=1
blue line in Fig.(b) represents the path for k=2
An Introduction to Bioinformatics Algorithms
Spectral Convolution
www.bioalgorithms.info
Limitation
The spectral convolution considers diagonals
separately without combining them into feasible
mutation scenarios.
10 20 30 40 50 55 65 75 85 95
10 15 30 35
10
10
20
20
30
30
40
40
50
60
δ
50
60
70
70
80
80
90
90
100
100
D(1) =10
shift function score = 10
50 55
70 75 90 95
δ
D(1) =6
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Dynamic Programming for
Spectral Alignment
Dij(k): the maximum number of 1s on a path to
(ai,bj) that uses at most k+1 diagonals.
Di ' j ' (k ) + 1, if (i ' , j ' ) ~ (i, j )
Dij (k ) = max {
(i ', j ')< (i , j ) Di ' j ' ( k − 1) + 1, otherwise
D (k ) = max Dij (k )
ij
Running time: O(n4 k)
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Edit Graph for Fast Spectral Alignment
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Fast Spectral Alignment Algorithm
M ij (k ) =
max Di ' j ' (k )
(i ', j ')< (i , j )
 Ddiag (i , j ) (k ) + 1
Dij (k ) = max 
M i −1, j −1 (k − 1) + 1
 Dij (k )

M ij (k ) = max M i −1, j (k )
M
 i , j −1 (k )
Running time: O(n2 k)
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Alignment: Complications
• Simultaneous analysis of N- and C-terminal
ions
• Taking into account the intensities and
charges
• Analysis of minor ions
• Much more complicated!
An Introduction to Bioinformatics Algorithms
www.bioalgorithms.info
Spectral Alignment: Complications
Spectra are combinations of an increasing (Nterminal ions) and a decreasing (C-terminal
ions) number series.
These series form two diagonals in the
spectral product, the main diagonal and the
perpendicular diagonal.
The described algorithm deals with the main
diagonal only.