Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Mass Spectrometry in a drug
discovery setting
Claus Andersen
Senior Scientist
Sienabiotech Spa
Overview
• From genes to phenotype
• Proteins an introduction
• Mass Spec for protein
•
•
•
•
Mass Spec data
Mass Spec data analysis
Mass Spec database searching
Recent advances
Claus Andersen
identification
quantification
characterization
Bioinformatics and statistics in a drug discovery company
From genes to phenotype
Genome comparison
mRNA expression
expression
proteins
Structure
Activation/inactivation
Activation/inactivation
functions
metabolites
Interactions
Kinematics
Protein
Proteinabundance
abundance
pathways
Metabolite levels
Pharmacophore
phenotypes
Claus Andersen
Degradation
Regulation
genes
ADME/Tox
Bioinformatics and statistics in a drug discovery company
Proteins as functional units
ATP
Glucose
Myosin
ATP
Claus Andersen
Bioinformatics and statistics in a drug discovery company
Vale and Milligan
Sciencepdb.org
2000
D.S. Goodsell
What affects the proteome
Interactions
Physiological role
Pharmaceutical
substances
Temperature
Cellular
proteome
Ribosome
Proteasome
protein production
protein degradation
Environment
mRNA
Genome
Claus Andersen
Stress
Bioinformatics and statistics in a drug discovery company
Mass Spec on proteins
Peptides
Control/Healthy
Treated/Sick
HPLC
Mass
Spectrometer
Protein
extraction and
digestion
Protein peptides
KKYAAELHLV
KAVQQPDGLA
QFHFHWGSLDQPDGLA
identification
quantification
MS spectra
and MS/MS spectra
P Phosphorylation
O Oxidation
… post translational
modifications (PTM)
Claus Andersen
characterization
Bioinformatics and statistics in a drug discovery company
Mass Spec data
5 mg
3000 MS spectra 500 MB
400 MS/MS spectra 200 MB
Gygi et al. Mol. Cell Bio. (1999)
Total 700 MB
Claus Andersen
Bioinformatics and statistics in a drug discovery company
Mass Spec data analysis
•
•
•
•
•
•
•
•
•
•
Fourier transformation
Gaussian peak fitting
Generation of theoretical spectra
Large scale
Large
scalespectral
spectralcomparison
comparison
Spectral deconvolution
Large scale sequence searching
Data fitting
Statistics and probability theory
Linear discriminant analysis
…. and lots more
Claus Andersen
(noise filtering)
(peak detection)
(sequencespectra)
searching)
(DB searching)
(de-novo sequencing)
(DB searching)
(quantitation)
(reliability estimation)
(quality assessment)
Bioinformatics and statistics in a drug discovery company
Large scale spectral comparison
Mass spec data
MS spectrum
In-silico data
Protein sequence DB ~2 mil
(Mpeptide+H)+ ±Δ
FLIDSSRFSYPERPIIFLSMCYNIYSIAYIVRLTVGRERISCDFEEAAEPVLIQEGLKNT
Protein peptides
MS/MS Spectrum
i
~60 mil
ERPIIFLSMCYNIYSIAYIV
Peptide fragments ~2000 mil
Ki
{
Claus Andersen
Ni
ERPIIFLSMCYNIYSIAYIV
ERPIIFLSMCYNIYSIAYI
ERPIIFLSMCYNIYSIAY
ERPIIFLSMCYNIYSIA
ERPIIFLSMCYNIYSI
ERPIIFLSMCYNIYS
ERPIIFLSMCYNIY
ERPIIFLSMCYNI
ERPIIFLSMCYN
ERPIIFLSMCY
ERPIIFLSMC
ERPIIFLSM
…
etc. etc…
Bioinformatics and statistics in a drug discovery company
V
IV
YIV
AYIV
IAYIV
SIAYIV
YSIAYIV
IYSIAYIV
NIYSIAYIV
…
Large scale spectral comparison
PEP_PROBE by Sadygov and Yates Anal. Chem. 75 2003
Hypergeometric probability model
K i N i K i
K N K
PK , N ( K i , N i )
Ni
N
where
n!
n
k
k!(n k )!
K Ki
i
is the binomial coefficient
Claus Andersen
Bioinformatics and statistics in a drug discovery company
N Ni
i
Large scale spectral comparison
Expectation value (E-value)
E N ( M H ) ( P P0 )
N( M H )
P
P P0
where () is the cumulative distribution function given by the
hypergeometric model, N ( M H ) is the number of all peptides in
the database matching the (M+H)+ mass value.
The E-value tells you how many peptides from the database are
expected to have the same or better matches to the experimental
spectrum by chance alone.
Sadygov and Yates Anal. Chem. 2003
Claus Andersen
Bioinformatics and statistics in a drug discovery company
Large scale spectral comparison
An example from yeast (Saccharomyces cerevisiae)
MS/MS spectrum
Yeast proteins
6 200
(M+H)+ = 2076.010 ± 0.002 AMU
Yeast peptides
~200 000
Peptide fragments
~5 mil
Top candidate peptides
N=569 160
K= 84 150
Protein name N1 K1 E-value
Peptide
FAS1
40
34
10-26.62
ATHILDFGPGGASGLGVLTHR
SIP2
34
15
10-5.25
LTPPQLPPQLENVILNKY
Sadygov and Yates Anal. Chem. 2003
Claus Andersen
Bioinformatics and statistics in a drug discovery company
Large scale spectral comparison
The protein FAS1 is part of the fatty acid biosynthesis of yeast.
Its enzyme classification number is (EC 2.3.1.86)
In general several peptides are found for each protein (3-10)
Protein identification
FAS1
www.kegg.org
Claus Andersen
Bioinformatics and statistics in a drug discovery company
Large scale spectral comparison
Other approaches
•An approach to correlate tandem mass spectral data of peptides with amino acid
sequences in a protein database. Yates’ group J.Am.Soc.Mass Spec. 5(11) 1994
•ProbID: a probabilistic algorithm to identify peptides through sequence database
searching using tandem mass spectral data. Aebersold’s group Proteomics 2(10) 2002
Most recent advances
•Inverted sequence DB used for background distribution estimation (PRISM)
Emili’s group Mol. Cell Proteomics, 2(2), p96-106, 2003
•Number of Sibling peptides (ProteinProphet)
Aebersold’s group Anal. Chem. 74, p5383-5392, 2004
•Suffix tree searching:
Lu and Chen Bioinformatics 19(2), pii113-ii121, 2003
•Bayesian approach:
Chen Biosilico in press 2004
Claus Andersen
Bioinformatics and statistics in a drug discovery company