Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A biased look at Biomarkers BioMarker Definition: Biomarker is a substance used as an indicator of a biologic state Existence of living organisms or biological process. A particular disease state Proteins Nucleic acids Metabolites: Carbohydrates Lipids Small molecules Biomarker Detection of biomarker Detection of biomarker – diagnosis Self properties, e.g enzymatic activities Antibodies, IHC, ELISA Detection of biomarker Quantitative a link between quantity of the marker and disease Qualitative a link between exist of a marker and disease Biomarker & Diagnosis Ideal Marker for diagnosis Should have great sensitivity, specificity, and accuracy in reflecting total disease burden. A tumor marker should also be prognostic of outcome and treatment Biomarker for Screening •The marker must be highly specific, minimize false positive and negative •The marker must be able to clearly reflect the different stages of the disease (early) •The marker must be easily detected without complicated medical procedures. The disease markers released to serum and urine are good targets for application of early screening. •The method for screening should be cost effective. Samples for biomarker detection Blood, urine, or other body fluids samples Tissue samples Prostate Cancer marker PSA PSA is a protein normally made in the prostate gland in ductal cells that make some of the semen. PSA helps to keep the semen liquid. PSA, also known as kallikrein III, seminin, semenogelase, γ-seminoprotein and P-30 antigen, is a glycoprotein, a serine protease Prostate Cancer Diagnosis with PSA Cancer of the prostate does not cause any symptoms until it is locally advanced or metastatic. There is a correlation between elevated PSA and prostate cancer. Detection of PSA is a surrogate for early detection of prostate cancer. Large screening trials have shown that PSA nearly doubles the rate of detection when combined with other methods. Based on these data, PSA testing was approved by the US FDA for the screening and early detection of prostate cancer. PSA is also found in the cytoplasm of benign prostate cells. “I never dreamed that my discovery four decades ago would lead to such a profit-driven public health disaster." -Richard Ablin (inventor of the PSA test) PSA screening generates ~$1.7 billion annually in the U.S. alone. Sensitivity = the ability of the test to detect the disease (True positive rate) Specificity = the likelihood that your test will be normal if you are disease free (True Negative) A brief aside about Statistics and Probability -Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000 If I test positive, what is the chance that I am really HIV negative? A brief aside about Statistics and Probability -Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000 What is the chance that I am HIV negative? 0.0001 0.001 0.01 0.1 0.9 0.99 0.9999 A brief aside about Statistics and Probability -Statistics are the formalization of common sense -because they have to handle many different situations, they can be really complicated -they should make you feel really good or really bad about your data -People are inherently bad at statisitics and probability Case Study: rate for being HIV positive: 1:10000 false positive rate of HIV test: 1:1000 What is the chance that I am HIV negative? 0.0001 0.001 0.01 0.1 0.9 0.99 0.9999 For every 1 True Positive there will be 10 false positives, so my chance of being Negative is 10/11. How about the PSA test? Rate is 15:10000 False Positive Rate is 60:1000 For every 15 True Positives, there will be 600 False Positives! Chance of being Negative 600/615 = .97 Chance of being Positive = .03 (before test chance was 0.015) -Is this true? How about the PSA test? Rate is 15:10000 False Positive Rate is 60:1000 For every 15 True Positives, there will be 610 False Positives! Chance of being Negative 600/615 = .97 Chance of being Positive = .03 (before test chance was 0.015) -Is this true? The test will miss 80% of the true positives (sensitivity = 20%) so there will only be 3 True Positives Detected so: Chance of being Negative 600/603 = 0.995 Chance of being True Positive = 0.005 Follow up for a +HIV test is another blood test. Follow up for +PSA test is tissue biopsy. How good does a Biomarker have to be? By Age 65 the rate of Prostate Cancer climbs to 8:1000 and the test performs much better. For every 8 True Positives, there will be 60 False Positives! Chance of being Negative 60/68 = .88 Chance of being Positive = .12 (before test chance was 0.015) How good does a Biomarker have to be? Prostate Cancer is one of the most frequent cancers (15:10000), most cancers are much less frequent (1:10000: 1:50000) so a biomarker would have to be much better than the PSA test. It is currently believed that a new biomarker would need sensitivity and specificity better than 95%. Early Proteomics Base Biomarker work was based on SELDI SELDI can detect 200-300 features in a sample. It has been used to find biomarkers from everything from blood to tears. Early Biomarker work has largely been discredited -Biomarkers with similar masses kept being rediscovered -When the proteins were identified, they were abundant serum proteins and were from the same proteins -Multi-center studies failed to validate the biomarkers in “clinical” setting -Realization that serum and other biofluids are incredibly complex. -Realization that serum and other biofluids are incredibly variable and “fragile” -some strong “biomarkers” -blood collection tube -# of freeze-thaw cycles -diet Key Concept: Proteins vary widely in concentration Typical Biomarker Discovery study will take 50 samples per condition. Typically takes 10 samples per condition to have a 90% chance of finding differences of 2 times.Validation will take 1000s of samples. Finally the assay will have to be converted to something that can be done in a clinical lab. PCA or other Clustering is used for Biomarker discovery 2007 Common Serum Markers for Cancer Diagnosis/prognosis AFP Lung CEA CA15-3 CA19-9 CA125 x x x x Pancreas x x Kidney x x Breast x Ovarian x Cervical x Uterine x x PSA PSAf PAP hTG HCGb Ferr NSE x x x x x x x x x x x x x x x Prostate x x Liver x Gastro Colon x x x x x x x x x x x x x x x Bladder x Brain x x x x Myeloma x Thyroid Testicular x x A2M x x Leukemia B2M x x x x x Conclusions -Biomarker Discovery is difficult -biofluids are complex -biofluids have a high dynamic range -biomarkers are usually low abundance -even taking “proximal” fluids typically does not help -the is a lot of person to person variability -Most Biomarkers will never become clinically relevant -statistical standards for diagnostic tools is very high -the more prevalent the disease the “better” the biomarker will perform -An MS based biomarker assay is unlikely due to the greater analytical performance of antibody based methods. -For a biomarker workflow to be meaningful it must be quantitative! Quantitative Approaches Stable Isotope Labeling methods -adds heavy isotopes to one sample so chemically identical compounds are mass shifted -added to the peptides/proteins using reactive groups -added to the proteins in vivo using heavy amino acids -can be multiplexed Label free methods -extracted ion chromatograms -spectral counting 100 863.4279 4700 Reflector Spec #1 MC[BP = 863.4, 3348] 3348.0 1737.8809 4700 Reflector Spec #1 MC[BP = 863.4, 3348] 90 100 1941.2 1738.8808 90 80 80 70 60 % Intensity 50 1739.8810 40 30 20 1740.8808 50 1296.6797 0 1737.49425 0 799.0 1739.64483 1740.72011 1741.79540 1742.87069 2084.6 2539.4324 2465.1926 2211.0522 2030.0236 2242.1663 1901.8827 1922.8702 1844.8245 1720.8409 1495.6821 1441.8 1570.6759 1353.6017 1222.6218 1174.5804 1079.5632 1125.4923 10 995.5375 30 20 1738.56954 Mass (m/z) 1425.6223 1021.5520 40 1210.6891 963.5271 10 881.2428 % Intensity 60 1737.8809 1059.5333 70 2727.4 Mass (m/z) 3370.2 4013.0 ISOTOPE-CODED AFFINITY TAG (ICAT): • Label protein samples with heavy and light reagent • Reagent contains affinity tag and heavy or light isotopes Chemically reactive group: forms a covalent bond to the protein or peptide Isotope-labeled linker: heavy or light, depending on which isotope is used Affinity tag: enables the protein or peptide bearing an ICAT to be isolated by affinity chromatography in a single step Example of an ICAT Reagent Biotin Affinity tag: Binds tightly to streptavidinagarose resin Reactive group: Thiol-reactive group will bind to Cys O Linker: Heavy version will have deuteriums at * Light version will have hydrogens at * NH NH H N * S O * O O * O * H N I O The ICAT Reagent How ICAT works? Affinity isolation on streptavidin beads Lyse & Label Quantification MS Identification MS/MS NH2-EACDPLR-COOH Light 100 100 MIX Heavy Proteolysis (ie trypsin) 0 0 550 570 m/z 590 200 400 m/z 600 ICAT Quantitation ICAT Advantages vs. Disadvantages • Estimates relative protein levels between samples with a reasonable level of accuracy (within 10%) • Yield and non specificity • Can be used on complex mixtures of proteins • Expensive • Slight chromatography differences • Tag fragmentation • Cys-specific label reduces sample complexity • Can set up the mass spectrometer to fragment only those peaks with a certain ratio • Meaning of relative quantification information • No presence of cysteine residues or not accessible by ICAT reagent iTRAQ™ Reagent Design Isobaric Tag (Total mass = 145) Reporter Balance Charged Neutral loss Gives strong signature ion in MS/MS Gives good b- and y-ion series Maintains charge state Maintains ionization efficiency of peptide Balance changes in concert with reporter mass to maintain total mass of 145 Neutral loss in MS/MS PRG Amine specific Isobaric Tag Total mass = 145 Isobaric Tag = MS/MS Fragmentation Site Amine specific peptide reactive group (NHS) (Total mass = 145) O Reporter Reporter Group mass (Mass = 114 thru N 117) 114 –117 (Retains Charge) O N Peptide Reactive Group O N O PRG Balance Group Mass 31-28 (Neutral loss) Multiplexed protein quantitation in saccharomyces cerevisiae using amine-reactive isobaric tagging reagents Ross, PL., et al, Mol Cell Proteomics 2004 3: 1154-1169. Balance (Mass = 31 thru 28) Isobaric Tagging - General Method (4-Plex) S1 S2 S3 Parallel Denature & Digest 114 31 -PRG + b 114 115 30 -PRG + Mix 116 MS 29 -PRG + b y b y b y 115 114 31 -N H 115 30 -N H 116 29 -N H 117 28 -N H MS/MS 116 117 y 117 -Reporter-Balance-Peptide INTACT - 4 samples identical m/z 28 -PRG + S4 - Peptide fragments EQUAL 1352.84 - Reporter ions DIFFERENT 100 117 116 80 115 114 90 70 1347.0 60 1349.6 1352.2 1354.8 Mass (m/z) y8 1360.0 P 40 111.0 112.8 30 114.6 116.4 118.2 y11 1352.8 y10 b10 b9 y9 b8 b7 y6 b6 y5 b4 y4 y3 y2 b1 142.1 10 112.1 q,H 20 b2 Mass (m/z) 39.045.1 A T 74.1 72.1 L % Intensity 50 1357.4 0 9.0 292.8 576.6 860.4 Mass (m/z) 1144.2 1428.0 Spotfire K-means Clustering of Protein-level Ratios G1L S PM G1L S PM G1L S PM MS/MS Spectra of a Singly-charged Peptide 100 *-TPHPALTEAK-* 90 8396.7 80 70 y8 50 P 40 y11 1352.8 y10 b10 b9 y9 b8 b7 y6 b6 y5 y4 y2 y3 b4 b2 b1 10 142.1 20 112.1q,H 30 39.0 45.1 A T 74.1 72.1 L 0 9.0 292.8 576.6 860.4 1144.2 1428.0 111.0 112.8 114.6 116.4 Mass (m/z) y8 b7 117.1 116.1 115.1 Mass (m/z) 114.1 % Intensity 60 118.2 120.0 757 759 761 763 Mass (m/z) 765 767 869 871 873 875 Mass (m/z) 877 879 Reporter Group Placement: Selection of ‘Quiet Summed Ion Intensity Region’ (~75,000 Spectra) Summed Ion Intensity 160000000 120000000 80000000 40000000 0 0 200 400 600 800 1000 m/z 1200 1400 1600 1800 2000 Simplified Workflow: (One extra step) Control Test 1 Test 2 Test 3 116 117 Example: Time course labeling Trypsin Digestion 114 Label with iTRAQ™ Reagents Quant 115 1 hr, RT, Single addition MIX ID and SCX Single 2D LC analysis for combined samples (4-plex) LC MS/MS Analysis MS/MS Differential Expression using iTRAQ™ Reagent Approach OverExpression of Chaperonin 10 Non-Cysteine containing Protein Cance Cancer r 54 50 Normal 45 *VLQATVVAVGSGS*K * iTRAQ Labeled Residue Normal 40 35 114 115 116 m/z, amu 117 30 25 y2 y1 20 y3 y5 15 b3 b2 10 b5 y4 b4 y6 5 y7b6 b7 0 100 200 300 400 500 m/z, amu 600 700 800 900 ITRAQ Advantages vs. Disadvantages • • • • • Estimates relative protein levels between samples with a reasonable level of accuracy (> 10%) Can be used on complex mixtures of proteins Isobaric so the tag is only visible in the MS/MS, keeping the precursor scans as clean as possible. The abundance of the peptides sums together. Making analysis of low abundance peptides easier. Replicates analyzed on the same LC-MS/MS run, minimizing run to run variability. • Reagent not completely specific • Expensive • Does not work on ion trap instruments • Reporters tend to dominate the spectra • You have to fragment everything and sort out the ITRAQ reporters later. The mass spec spends a lot of time analyzing peptides with no quantitative differences. Stable Isotope Labeling in Animal Culture SILAC Advantages vs. Disadvantages • • Estimates relative protein levels between samples with a high level of accuracy ( <5%) Can be used on complex mixtures of proteins • Can set up the mass spectrometer to fragment only those peaks with a certain ratio • Extremely flexible and can be adapted to many systems. • Labeling may be incomplete • Urea Cycle may cause incorporation of heavy isotopes into other amino acids • Expensive • Works best on high resolution instruments. Label-Free Quantitation All approaches so far require purchase of isotopically labeled reagents (can be expensive). •What if you want to compare large numbers of samples (10+) •What if you can’t afford lots of reagents? •Peak/Spectral counting •Peak area comparison (Extracted Ion Chromatograms) Spectral Counting •Count the number of peptides identified from a protein in each sample. •Typically do not count repeat identifications of the same peptide •Not accurate at quantifying magnitude of change, but can be used to determine if there is a difference. •In general, need a spectral count difference of about 4 peptides in order to be confident of a difference being real. •Most proteins in complex mixtures are identified by less than 4 peptides. EIC (Extracted Ion Chromatogram) •Measure intensity of peak during its elution off HPLC column and into the mass spectrometer. •Measure area of peak in XIC. •More accurate than selecting peak intensity for one given scan. emPAI (Exponentially Modified Protein Abundance Index) emPAI = 10PAI –1 Where PAI = Nobserved / Nobservable What is an ‘observable’ peptide •Peptides with a precursor mass between 800-2400Da. •There is a roughly linear relationship between log protein concentration and the ratio of ‘observable’ peptides observed in range of 3-500 fmoles. •If you know how much total protein you analyzed you can derive absolute abundancies. Ishihama et al. Mol Cell Proteomics (2005) 4 9 1265-1272 MRM (Multiple Reaction Monitoring) Look for a component of a specific mass that when fragmented forms a fragment of another specific mass. Transition: precursor m/z 521.7 •Very sensitive and specific. fragment m/z 757.6 MRM •Best performed on a triple quadrupole instrument. •Scans are very fast, so can perform multiple transition scans on a chromatographic time-scale. •Requires a lot of optimization: Verify transitions are reproducible, typically want 2-3 transitions/peptide, 3-4 peptides/protein. Determine the retention time to maximize the number of peptides that can be analyzed per run. It is possible to analyze 100s of transition per hour •MRM coupled to isotopically labeled peptides allows for very high sensitivity and high accuracy analysis and can give absolute quantification. •Once optimized 1000s of samples can be run in a short time frame •Not for discovery! You must already know what you are looking for, sometimes refered to as targeted proteomics Issues with MS Quantitation Analysis •Should you use all data for quantitation? •Minimum peak intensity? •Peaks near to signal to noise will have much higher variability in quantitation accuracy. •Very intensive peaks may be saturated. •Proteins identified by a single peptide are probably not accurately quantified? •It is best to ignore sequences with more than one form: PTMs, missed cleavages, etc. •Multiple charge states should be summed. Results are normally reported with a mean and standard deviation Conclusions •There are many different ways to quantitate proteomics data •Quantitative studies need to be approached carefully, because it is easy to make mistakes •No one strategy is best •MRM is the most sensitive and accurate, but requires the most optimization and cannot be used for discovery.