Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Supplementary Materials and Methods 1 Data analysis 1.1 The latest Raught lab spectral library 1.1.1 Standard database search All non-decoy spectra were searched against a concatenated target/decoy database consisting of forward and reverse versions of the IPI human v3.82 (total 184,208 sequences) using X!Tandem (CYCLONE 2011.12.1) with the following parameters: (1) [-10, 10] ppm and [-0.4,0.4] Da for peptide- and fragment-mass tolerance; (2) trypsin cleavage at both termini and two missed cleavage allowed; (3) 15.994915@M and 57.021464@C as variable and fixed modifications, respectively. Then, the search results were validated using PeptideProphet and ProteinProphet in TPP v4.5. 1.1.2 Identification of Ub/Ubl conjugation sites using the proposed workflow Create a combinatorial database Target protein sequences were respectively processed using MchopNSpice with the following parameters: spice species was H. sapiens; spice site was KX; spice mode was once per fragment; include unmodified fragments in output; allow up to 2 protein miscleavages; allow up to 0 miscleavage in the “spice sequence”; output formatting was FASTA (single protein sequence); mark all cleaved sites (“J”); retain comments in FASTA format without line breaks in FASTA output. Lys-C was selected as the enzyme for Ub, NEDD8, ISG15 and ATG8. Trypsin (Lys/Arg, do not cleave at Pro) was selected as the enzyme for SUMO1, SUMO2, SUMO3 and FAT10. The spice sequence was itself for each Ub/Ubl. Non-target protein sequences were digested using MchopNSpice with the following parameters: spice species was none; spice mode was once per fragment; include unmodified fragments in output; enzyme was trypsin (Lys/Arg, do not cleave at Pro); allow up to 2 protein miscleavages; output formatting was FASTA (single protein sequence); mark all cleaved sites (“J”); retain comments in FASTA format without line breaks in FASTA output. A combinatorial fasta database was created by combining the modified target protein sequences and the digested non-target protein sequences. Identification of Ub/Ubl conjugation sites using UblSearch All non-decoy spectra were resubmitted to search against the created combinatorial database using UblSearch with the following parameters: (1) [-10, 10] ppm and [-0.4, 0.4] Da for peptideand fragment-mass tolerance; (2) [X]|[J] as cleavage site and 0 missed cleavage allowed; (3) 15.994915@M and 0.0000001@K as variable modifications, 57.021464@C as fixed modification. 1.1.3 Identification of Ub/Ubl conjugation sites using the ChopNSpice method Create a modified database Target protein sequences were respectively processed using ChopNSpice with the following parameters: spice species was H. sapiens; spice site was KX; spice mode was once per fragment; include unmodified fragments in output; enzyme was trypsin (Lys/Arg, do not cleave at Pro); allow up to 2 protein miscleavages; allow up to 0 miscleavage in the “spice sequence”; output formatting was FASTA (single protein sequence); mark all cleaved sites (“J”); retain comments in FASTA format without line breaks in FASTA output. Lys-C was selected as the enzyme for Ub, NEDD8, ISG15 and ATG8. Trypsin (Lys/Arg, do not cleave at Pro) was selected as the enzyme for SUMO1, SUMO2, SUMO3 and FAT10. The spice sequence was itself for each Ub/Ubl. Identification of Ub/Ubl conjugation sites using X!Tandem All non-decoy spectra were resubmitted to search against the modified database using X!Tandem with the following parameters: (1) [-10, 10] ppm and [-0.4, 0.4] Da for peptide- and fragment-mass tolerance; (2) [X]|[J] as cleavage site and 0 missed cleavage allowed; (3) 15.994915@M and 57.021464@C as variable and fixed modifications, respectively. 1.2 Trypanosoma cruzi experimental dataset 1.2.1 Standard database search The Trypanosoma cruzi experimental dataset were searched using X!Tandem with the following parameters: (1) [-2, 4] Da and [-0.4, 0.4] Da for peptide- and fragment-mass tolerance; (2) trypsin cleavage at both termini and two missed cleavage allowed; (3) 15.994915@M and 57.021464@C as variable and fixed modifications, respectively; (4) a FASTA database of Trypanosoma cruzi (target+decoy, total 91,648 sequences). 1.2.2 Identification of Ub/Ubl conjugation sites using the proposed workflow Create a combinatorial database Target protein sequences were processed using MchopNSpice with the following parameters: spice species was custom; spice site was KX; spice sequence was TPQELGMEDDDVIDAMVEQTGG; spice mode was once per fragment; include unmodified fragments in output; enzyme was trypsin (Lys/Arg, do not cleave at Pro); allow up to 2 protein miscleavages; allow up to 0 miscleavage in the “spice sequence”; output formatting was FASTA (single protein sequence); mark all cleaved sites (“J”); retain comments in FASTA format without line breaks in FASTA output. Non-target protein sequences were digested using MchopNSpice with the following parameters: spice species was none; spice mode was once per fragment; include unmodified fragments in output; enzyme was trypsin (Lys/Arg, do not cleave at Pro); allow up to 2 protein miscleavages; output formatting was FASTA (single protein sequence); mark all cleaved sites (“J”); retain comments in FASTA format without line breaks in FASTA output. A combinatorial fasta database was created by combining the modified target protein sequences and the digested non-target protein sequences. Identification of Ub/Ubl conjugation sites using UblSearch All MS/MS spectra were resubmitted to search against the created combinatorial database using UblSearch with the following parameters: (1) [-2, 4] Da and [-0.4, 0.4] Da for peptide- and fragment-mass tolerance; (2) [X]|[J] as cleavage site and 0 missed cleavage allowed; (3) 15.994915@M and 0.0000001@K as variable modifications, 57.021464@C as fixed modification. 1.2.3 Identification of Ub/Ubl conjugation sites using the ChopNSpice method Create a modified database Target protein sequences were processed using ChopNSpice with the following parameters: spice species was custom; spice site was KX; spice sequence was TPQELGMEDDDVIDAMVEQTGG; spice mode was once per fragment; include unmodified fragments in output; enzyme was trypsin (Lys/Arg, do not cleave at Pro); allow up to 2 protein miscleavages; allow up to 0 miscleavage in the “spice sequence”; output formatting was FASTA (single protein sequence); mark all cleaved sites (“J”); retain comments in FASTA format without line breaks in FASTA output. Identification of Ub/Ubl conjugation sites using X!Tandem All MS/MS spectra were resubmitted to search against the modified database using X!Tandem with the following parameters: (1) [-2, 4] Da and [-0.4, 0.4] Da for peptide- and fragment-mass tolerance; (2) [X]|[J] as cleavage site and 0 missed cleavage allowed; (3) 15.994915@M and 57.021464@C as variable and fixed modifications, respectively. Supplementary Fig. 1. Theoretical fragmentation patterns of the linearized branched form and cross-linked form of a SUMO1-modified peptide. (a) The linearized branched form of the SUMO1-modified peptide. Theoretical fragmentation of the linearized branched peptide would produce fragmentation ions similarly to a linear peptide. The sequence between the N-terminus and the modified lysine residue of the target peptide would produce incorrect fragmentation ions. (b) The cross-linked form of the SUMO1-modified peptide. Both the target peptide and the remnant would produce fragmentation ions during CID. Supplementary Fig .2. Workflow of UblSearch. 1) Identifying all the candidate peptides for a given MS/MS spectrum. UblSearch finds all the candidate peptides within a given mass tolerance, including linear and cross-linked peptides, from the created combinatorial database for each MS/MS spectrum. 2) Generating a theoretical fragment pattern for each candidate peptide. For the linear peptides, their theoretical fragment patterns are the same as normal database searches. For the cross-linked peptide, UblSearch initially considers the remnant as the variable modification on the miscleaved lysine residue within the target peptide (e.g., K2 or K7). A new fragmentation model for the Ub/Ubl conjugation peptides is used then to generate correspondingly theoretical fragment ions from the target peptides and the remnants (Supplementary Fig. 1B). 3) Scoring the candidate peptides and calculating the expectation value for the peptide with the highest score. UblSearch uses the X!Tandem scoring scheme to find the peptide matching best with the given MS/MS spectra and calculates the expectation value of the peptide identification. Supplementary Fig. 3. Mass Spectrum ID 583 was successfully matched with MQIFVK[Ub_LysC]TLTGK by the improved workflow. Four of the top 5 peaks and 33% of the total intensity of all peaks in the spectrum were successfully matched with the fragment ions generated from the cross-linked form.