* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Proteomic Survey of Camel Urine Reveals High Levels of
Implicit solvation wikipedia , lookup
Circular dichroism wikipedia , lookup
Structural alignment wikipedia , lookup
Protein domain wikipedia , lookup
Protein folding wikipedia , lookup
Protein design wikipedia , lookup
Intrinsically disordered proteins wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Degradomics wikipedia , lookup
Gel electrophoresis wikipedia , lookup
Protein moonlighting wikipedia , lookup
Protein structure prediction wikipedia , lookup
Protein–protein interaction wikipedia , lookup
Protein purification wikipedia , lookup
Homology modeling wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
SUPPORTING INFORMATION 1 Materials and methods 1.1 Animals and urine sample collection Camels selected for this study were nine healthy female domesticated one-humped camels (Camelus dromedarius). The animals were kept on a private farm and given free access to water and camel feed. Urine collection from virgin (3), pregnant (3) and lactating (3) camels was usually done at feeding time and was performed by experienced camel attendants. Gestation in camels normally lasts 11 months and urine samples for this study were taken from pregnant camels 5-6 months after fertilisation. Urine was allowed to flow directly into stainless steel containers and then transferred to glass bottles. Samples were transported to the laboratory as soon as practicable (within 4 h) and were frozen at -80 ºC. Urine samples were shipped frozen from Riyadh, Saudi Arabia, to Aberdeen, UK, and processed for proteomic analysis on arrival. 1.2 Sample preparation (protein extraction) Urines were thawed to 25°C, vortex mixed thoroughly to dissolve precipitates and centrifuged at 3,100 x g for 20 min to remove particulates. Supernatants (20 mL) were sterilised by filtration using disposable 10-mL syringes and Puradisc 25AS filter devices with 0.2 µm PES membranes (Whatman International Ltd., Maidstone, UK). Samples were then concentrated and desalted using Vivaspin 20 disposable centrifugal concentrator devices with 3,000 MWCO PES membranes (Sartorius Stedim Biotech, Goettingen, Germany) in a benchtop centrifuge (IEC CL30R (Thermo Fisher Scientific, Loughborough, UK) equipped with a swing-bucket rotor at 4,100 rpm (3,100 x g). Two cycles of concentration to 1 mL and addition of 10 mL deionised water were used and the desalted samples were each concentrated to 1 mL. Protein concentrations, estimated by Ponceau S/TCA assay in comparison with a BSA standard, were used to calculate the volumes containing approximately 50 µg protein, from which protein was precipitated using a Ready Prep 2-D Cleanup Kit procedure (Bio-Rad Ltd., Hemel Hempstead, UK). 1.3 One-dimensional gel electrophoresis Protein extracts prepared as described above were analysed by 1-DE on a 4-12 % NuPAGE Bis-Tris mini gel (Invitrogen). Protein pellets from the clean-up procedure were dissolved in 1x NuPAGE LDS Sample Buffer (Invitrogen) and volumes containing approximately 25 µg protein were processed as follows. Protein samples were mixed with reducing agent (Invitrogen) and heated at 70 ºC for 10 minutes before being loaded onto the gel tracks. A marker track of SeeBlue Plus2 (Invitrogen) was also loaded. The gel was electrophoresed for 40 minutes using MES electrophoresis buffer (Invitrogen). Following electrophoresis the gel was stained with colloidal Coomassie Brilliant Blue G250 then air-dried between cellophane sheets using an Easy Breeze gel dryer (Hoefer, Inc., Holliston, MA, USA). 1.4 In-gel digestion A strip of gel (2 mm x 64 mm) spanning the entire molecular weight range was excised from each track. Each strip was cut into 16 pieces of equal size (2 mm x 4 mm). Proteins were digested in the gel pieces with trypsin (sequencing grade; Promega) using an Investigator ProGest robotic workstation (Digilab Ltd., Huntingdon, UK) and a standard protocol for 8hour tryptic digestion at 37 ºC. Peptide solutions were dried in a SPD1010 Savant SpeedVac concentrator (Thermo Fisher Scientific) and dissolved in 10 µL 0.1 % formic acid for analysis by LC-MS/MS. 1.7 Liquid chromatography-tandem mass spectrometry (LC-MS/MS) Aliquots (2 µL) of the peptide solutions were injected into a LC-MS/MS system which comprised an UltiMate 3000 LC (Dionex (UK) Ltd., Camberley, UK) coupled to an HCTultra ion trap mass spectrometer (Bruker Daltonics, Bremen, Germany) fitted with a low-flow stainless steel nebuliser needle in the ESI source and controlled by HyStar software (version 3.2; Bruker Daltonics). Peptides were separated on a PepSwift monolithic PS-DVB column (200 µm i.d. x 5 cm; Dionex) at a flow rate of 2 µL/min using a linear gradient of 0 – 40 % acetonitrile/water/formic acid (80:20:0.04) (solvent B) in water/acetonitrile/formic acid (97:3:0.05) (solvent A) over 40 min, followed by a 1 min column wash in 90 % solvent B and a 5 min equilibration step in 100 % solvent A. MS/MS data (scan range m/z 100 – 2200, averages = 2) were acquired in positive data-dependent AutoMS(n) mode using esquireControl software (version 6.1; Bruker Daltonics). Up to three precursor ions were selected from the MS scan (range m/z 300 – 1500, averages = 3) in each AutoMS(n) cycle. Precursors were actively excluded after being selected twice within a 1 min window; singlycharged ions were also excluded. Peaks were detected above an intensity threshold of 50,000 and deconvoluted automatically using DataAnalysis software (version 3.4; Bruker Daltonics). 1.8 Peptide identification by MS/MS ions search Combined mass lists in the form of Mascot Generic Format (*.mgf) files, one for each urine sample, were created using the ‘ProcessWithMethod’ tool in DataAnalysis. Each set of 16 MS/MS data files were selected and processed using the 2D_LC method script provided by Bruker Daltonics. The ‘combined_data.mgf’ files thus created were used as inputs to MS/MS ions searches using Mascot Server (version 2.2). Because of the limited availability of camel protein sequences, searches were conducted using three different databases in order to maximise the identification of peptides. The following parameters were used: Enzyme = Trypsin; Fixed modifications = Carbamidomethyl (C); Variable modifications = Oxidation (M); Mass values = Monoisotopic; Peptide mass tolerance = 1.5 Da; Fragment mass tolerance = 0.5 Da; Max. missed cleavages = 1; Instrument type = ESI-TRAP. For searches using the Swiss-Prot database (version 2010_05; 516603 sequences) individual ion scores > 47 indicated identity or extensive homology (p < 0.05). Common contaminants (human keratins, trypsin) were excluded. Searches were repeated using an alpaca (Vicugna pacos) protein sequence database built from the Vicugna_pacos.vicPac1.59.pep.all.fa file (22/7/2010; 11793 sequences) downloaded from the ENSEMBL Genome Browser (ftp://ftp.ensembl.org/pub/current_fasta/vicugna_pacos/pep/) (accessed 23/8/2010). Individual ion scores > 31 indicated identity or extensive homology (p < 0.05). Finally, searches were repeated again over a camel (Camelus dromedarius) EST sequence database, built from the EST_Sequences.rar file (21/12/2010; 102930 sequences) downloaded from the Arabian Camel Genome website (http://camel.kacst.edu.sa/index.php/est-download-est) (accessed 21/12/2010). Individual ion scores > 39 indicated identity or extensive homology (p < 0.05). In a small minority of cases, different peptide sequences from the different databases were matched to the same MS/MS spectrum. In order to assign sequence precedence to the best quality matches, and to aid the consolidation of the data, Mascot search results for every sample were exported in CSV format and processed as follows using Microsoft Office Excel 2007. Protein hit information in columns ‘prot_hit_num’, ‘pep_score’, ‘pep_expect’, ‘pep_seq’, ‘pep_var_mod’, ‘pep_var_mod_pos’, and ‘pep_scan_title’ from the three different database searches were copied into a single Excel sheet. Two extra columns were added with the names ‘sample’ and ‘d_base’ into which the sample label and database name were added manually. The entire data set (all samples, all searches) was filtered on column ‘pep_expect’ using a number filter of less than or equal to 0.05 to remove matches that were not statistically significant at the identity threshold. The remaining data were copied to a new sheet and sorted by ‘pep_scan_title’ (A to Z) then by ‘pep_score’ (largest to smallest) then by ‘pep_expect’ (smallest to largest). The data function ‘remove duplicates’ was applied to column ‘pep_scan_title’ only; this had the effect of removing multiple instances of the same MS/MS spectra, leaving those which had the best quality peptide sequence matches (indicated by the highest pep_score and lowest pep_expect values). These data were further consolidated by removing peptide sequences identified in only one sample, to produce the final list reported in Table 2. Each Swiss-Prot and alpaca Mascot search included an automatic decoy database search for estimation of the false discovery rate (FDR). These decoy searches use random sequences (one for every ‘real’ sequence, of matching length) having the same average amino acid composition as the ‘real’ sequence database. The average FDR for peptide matches above the identity threshold was 2.6%. The FDR in the final data set of 735 peptides may be much lower because only those peptides detected with statistically significant (p < 0.05) scores in at least two independent samples were included. 1.9 Assignment of protein descriptions by BLAST search homology Alpaca and camel EST sequences had no annotations. In order to identify matches from these databases, homology searches were carried out using the NCBI BLASTP 2.2 program (http://blast.ncbi.nlm.nih.gov/Blast.cgi) and non-redundant protein sequence database. Protein descriptions from the best statistically significant matches were assigned to the camelid sequences. Additionally, peptides identified from camel EST sequences were subjected to short BLASTP searches, to check that both peptide and EST sequence could be matched to the same protein description. These descriptions should be treated with caution because they rely on the assumption that protein function is conserved between species, and this may not always be the case. 1.10 Gene ontology (GO) term analysis Human orthologs were available for 137 out of 147 identified proteins in camel urine. UniProtKB (Swiss-Prot) identifiers for these human orthologs were used as the input to GO term analysis using the AmiGO online tool (version 1.8, GO database release 2012-05-26) [1] at http://amigo.geneontology.org/cgi-bin/amigo/term_enrichment. A background set of 12407 human gene products, (http://www.uniprot.org/docs/humpvar.txt release 2012_05 of 16 May 2012, accessed 29 May 2012) was used. A maximum P-value of 0.01 and a minimum of five gene products were set as thresholds. Electronically inferred data was not included. The results are shown in Table 3 and Figure 2. Results of GO term analysis should be viewed as speculative because they rely on the assumption that protein function and location are conserved between species, and this may not always be the case. References 1. Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, AmiGO Hub, Web Presence Working Group. AmiGO: online access to ontology and annotation data. Bioinformatics. Jan 2009;25(2):288-9. Legends to figures Figure 2. Assignment of gene ontology (GO) terms to the proteins identified in camel urine: (A) biological process and (B) molecular function. Numbers of proteins (human orthologs) are shown in each statistically significant enriched (compared to all human proteins) GO category. Only the 20 most highly populated GO biological process terms are shown (out of 42).