Download FDR Thresholding and Clustering - Proteome Software

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
FDR Thresholding
Caleb J. Emmons
Slide: 1
What is FDR?
If decoy proteins are present
Protein FDR =
# decoy proteins identified
# target proteins identified
Peptide FDR =
# spectra from decoy proteins
# spectra from target proteins
Slide: 2
The FDR Browser
How does FDR Thresholding work?
5
4
3
2
1
The “FDR Landscape”
Slide: 4
How does FDR Thresholding work?
The “FDR Landscape”
Slide: 5
Confusing!
Some Fine Points
Slide: 6
Protein Clustering
Poster 509, Tuesday 10:30-1:00
Informatics: Quantification/Validation
Caleb J. Emmons
Slide: 7
What is a Cluster?
Slide: 8
Total Peptide Evidence
PEtot(A) = sum of peptide probabilities
over all peptides matching A
Protein
PEtot
K1C10
1481%
K1C14
1061%
K1C16
852%
K1C17
503%
Slide: 9
Joint Peptide Evidence
PEjoint(A, B) =
sum of peptide probabilities
over all peptides matching A and B
PEjoint
K1C10
K1C14
K1C14
184%
K1C16
184%
661%
K1C17
84%
375%
K1C16
175%
Slide: 10
Cluster Formation
Directly similar
A≈B
if
1) their joint evidence is at least 95%, and
2) their joint evidence is at least half of
the total evidence for A or B
Clusters
Proteins A and B are in the same cluster if they are directly similar, or if they
can be connected with a sequence of proteins that are directly similar.
Slide: 11
Cluster Formation
Protein
PEtot
K1C10
1481%
K1C14
1061%
K1C16
852%
K1C17
503%
A≈B?
K1C10
PEjoint
no
K1C16
no
yes
K1C17
no
yes
K1C14
K1C14
184%
K1C16
184%
661%
K1C17
84%
375%
K1C14
K1C14
K1C10
K1C16
175%
K1C16
no
Slide: 12
Peptide-Protein Weights
PEexcl(C) =
sum of peptide probabilities
over all peptides exclusively matching C
W(p, C) =
A
B
C
Slide: 13
Spectrum Counting
Exclusive peptide/spectrum: associated only with this single cluster/protein
Unique peptides: only consider amino acid sequences
B
SEQ1, +2
SEQ1, +3
SEQ4,+2
SEQ3, +2
A
SEQ5, +2
C
SEQ7, +3
Exclusive
Unique
Peptides
SEQ2, +2
Exclusive
Spectra
Unique
Spectra
Total
Spectra
Unique spectra: only consider amino acid sequence, modifications, & charge state
Protein A
3
2
3
1
Protein B
4
2
4
2
Protein C
4
2
3
1
Cluster
of B&C
6
5
5
4
SEQ7,
+3
Slide: 14
Quantitative Values
Total and Weighted Spectrum Counts run over all spectra in the cluster
Total Ion Current (TIC) and Precursor Intensity may be computed,
treating the cluster as a collection of spectra.
Normalized Spectral Abundance Factor (NSAF) roughly consists of a
ratio of an exclusive spectrum count and protein length, so does not
make direct sense on the level of cluster (as clusters do not have a
‘length’). However, the average NSAF over the member proteins gives an
interpretable value. Similarly, we compute the Exponentially Modified
Protein Abundance Index (emPAI) as an average over the member
proteins in the cluster.
Slide: 15
Related documents