Download Proteomic Survey of Camel Urine Reveals High Levels of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Implicit solvation wikipedia , lookup

Circular dichroism wikipedia , lookup

Protein wikipedia , lookup

Structural alignment wikipedia , lookup

Protein domain wikipedia , lookup

Protein folding wikipedia , lookup

Protein design wikipedia , lookup

Intrinsically disordered proteins wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Degradomics wikipedia , lookup

Gel electrophoresis wikipedia , lookup

Cyclol wikipedia , lookup

Protein moonlighting wikipedia , lookup

Protein structure prediction wikipedia , lookup

Proteomics wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Protein purification wikipedia , lookup

Homology modeling wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Western blot wikipedia , lookup

Protein mass spectrometry wikipedia , lookup

Transcript
SUPPORTING INFORMATION
1
Materials and methods
1.1
Animals and urine sample collection
Camels selected for this study were nine healthy female domesticated one-humped camels
(Camelus dromedarius). The animals were kept on a private farm and given free access to
water and camel feed. Urine collection from virgin (3), pregnant (3) and lactating (3) camels
was usually done at feeding time and was performed by experienced camel attendants.
Gestation in camels normally lasts 11 months and urine samples for this study were taken
from pregnant camels 5-6 months after fertilisation. Urine was allowed to flow directly into
stainless steel containers and then transferred to glass bottles. Samples were transported to
the laboratory as soon as practicable (within 4 h) and were frozen at -80 ºC. Urine samples
were shipped frozen from Riyadh, Saudi Arabia, to Aberdeen, UK, and processed for
proteomic analysis on arrival.
1.2
Sample preparation (protein extraction)
Urines were thawed to 25°C, vortex mixed thoroughly to dissolve precipitates and centrifuged
at 3,100 x g for 20 min to remove particulates. Supernatants (20 mL) were sterilised by
filtration using disposable 10-mL syringes and Puradisc 25AS filter devices with 0.2 µm PES
membranes (Whatman International Ltd., Maidstone, UK). Samples were then concentrated
and desalted using Vivaspin 20 disposable centrifugal concentrator devices with 3,000
MWCO PES membranes (Sartorius Stedim Biotech, Goettingen, Germany) in a benchtop
centrifuge (IEC CL30R (Thermo Fisher Scientific, Loughborough, UK) equipped with a
swing-bucket rotor at 4,100 rpm (3,100 x g). Two cycles of concentration to 1 mL and
addition of 10 mL deionised water were used and the desalted samples were each
concentrated to 1 mL. Protein concentrations, estimated by Ponceau S/TCA assay in
comparison with a BSA standard, were used to calculate the volumes containing
approximately 50 µg protein, from which protein was precipitated using a Ready Prep 2-D
Cleanup Kit procedure (Bio-Rad Ltd., Hemel Hempstead, UK).
1.3
One-dimensional gel electrophoresis
Protein extracts prepared as described above were analysed by 1-DE on a 4-12 % NuPAGE
Bis-Tris mini gel (Invitrogen). Protein pellets from the clean-up procedure were dissolved in
1x NuPAGE LDS Sample Buffer (Invitrogen) and volumes containing approximately 25 µg
protein were processed as follows. Protein samples were mixed with reducing agent
(Invitrogen) and heated at 70 ºC for 10 minutes before being loaded onto the gel tracks. A
marker track of SeeBlue Plus2 (Invitrogen) was also loaded. The gel was electrophoresed for
40 minutes using MES electrophoresis buffer (Invitrogen). Following electrophoresis the gel
was stained with colloidal Coomassie Brilliant Blue G250 then air-dried between cellophane
sheets using an Easy Breeze gel dryer (Hoefer, Inc., Holliston, MA, USA).
1.4
In-gel digestion
A strip of gel (2 mm x 64 mm) spanning the entire molecular weight range was excised from
each track. Each strip was cut into 16 pieces of equal size (2 mm x 4 mm). Proteins were
digested in the gel pieces with trypsin (sequencing grade; Promega) using an Investigator
ProGest robotic workstation (Digilab Ltd., Huntingdon, UK) and a standard protocol for 8hour tryptic digestion at 37 ºC. Peptide solutions were dried in a SPD1010 Savant SpeedVac
concentrator (Thermo Fisher Scientific) and dissolved in 10 µL 0.1 % formic acid for analysis
by LC-MS/MS.
1.7
Liquid chromatography-tandem mass spectrometry (LC-MS/MS)
Aliquots (2 µL) of the peptide solutions were injected into a LC-MS/MS system which
comprised an UltiMate 3000 LC (Dionex (UK) Ltd., Camberley, UK) coupled to an HCTultra
ion trap mass spectrometer (Bruker Daltonics, Bremen, Germany) fitted with a low-flow
stainless steel nebuliser needle in the ESI source and controlled by HyStar software (version
3.2; Bruker Daltonics). Peptides were separated on a PepSwift monolithic PS-DVB column
(200 µm i.d. x 5 cm; Dionex) at a flow rate of 2 µL/min using a linear gradient of 0 – 40 %
acetonitrile/water/formic acid (80:20:0.04) (solvent B) in water/acetonitrile/formic acid
(97:3:0.05) (solvent A) over 40 min, followed by a 1 min column wash in 90 % solvent B and
a 5 min equilibration step in 100 % solvent A. MS/MS data (scan range m/z 100 – 2200,
averages = 2) were acquired in positive data-dependent AutoMS(n) mode using
esquireControl software (version 6.1; Bruker Daltonics). Up to three precursor ions were
selected from the MS scan (range m/z 300 – 1500, averages = 3) in each AutoMS(n) cycle.
Precursors were actively excluded after being selected twice within a 1 min window; singlycharged ions were also excluded. Peaks were detected above an intensity threshold of 50,000
and deconvoluted automatically using DataAnalysis software (version 3.4; Bruker Daltonics).
1.8
Peptide identification by MS/MS ions search
Combined mass lists in the form of Mascot Generic Format (*.mgf) files, one for each urine
sample, were created using the ‘ProcessWithMethod’ tool in DataAnalysis. Each set of 16
MS/MS data files were selected and processed using the 2D_LC method script provided by
Bruker Daltonics. The ‘combined_data.mgf’ files thus created were used as inputs to MS/MS
ions searches using Mascot Server (version 2.2). Because of the limited availability of camel
protein sequences, searches were conducted using three different databases in order to
maximise the identification of peptides. The following parameters were used: Enzyme =
Trypsin; Fixed modifications = Carbamidomethyl (C); Variable modifications = Oxidation
(M); Mass values = Monoisotopic; Peptide mass tolerance = 1.5 Da; Fragment mass tolerance
= 0.5 Da; Max. missed cleavages = 1; Instrument type = ESI-TRAP.
For searches using the Swiss-Prot database (version 2010_05; 516603 sequences) individual
ion scores > 47 indicated identity or extensive homology (p < 0.05). Common contaminants
(human keratins, trypsin) were excluded. Searches were repeated using an alpaca (Vicugna
pacos) protein sequence database built from the Vicugna_pacos.vicPac1.59.pep.all.fa file
(22/7/2010; 11793 sequences) downloaded from the ENSEMBL Genome Browser
(ftp://ftp.ensembl.org/pub/current_fasta/vicugna_pacos/pep/) (accessed 23/8/2010).
Individual ion scores > 31 indicated identity or extensive homology (p < 0.05). Finally,
searches were repeated again over a camel (Camelus dromedarius) EST sequence database,
built from the EST_Sequences.rar file (21/12/2010; 102930 sequences) downloaded from the
Arabian Camel Genome website (http://camel.kacst.edu.sa/index.php/est-download-est)
(accessed 21/12/2010). Individual ion scores > 39 indicated identity or extensive homology
(p < 0.05).
In a small minority of cases, different peptide sequences from the different databases were
matched to the same MS/MS spectrum. In order to assign sequence precedence to the best
quality matches, and to aid the consolidation of the data, Mascot search results for every
sample were exported in CSV format and processed as follows using Microsoft Office Excel
2007. Protein hit information in columns ‘prot_hit_num’, ‘pep_score’, ‘pep_expect’,
‘pep_seq’, ‘pep_var_mod’, ‘pep_var_mod_pos’, and ‘pep_scan_title’ from the three different
database searches were copied into a single Excel sheet. Two extra columns were added with
the names ‘sample’ and ‘d_base’ into which the sample label and database name were added
manually. The entire data set (all samples, all searches) was filtered on column ‘pep_expect’
using a number filter of less than or equal to 0.05 to remove matches that were not statistically
significant at the identity threshold. The remaining data were copied to a new sheet and
sorted by ‘pep_scan_title’ (A to Z) then by ‘pep_score’ (largest to smallest) then by
‘pep_expect’ (smallest to largest). The data function ‘remove duplicates’ was applied to
column ‘pep_scan_title’ only; this had the effect of removing multiple instances of the same
MS/MS spectra, leaving those which had the best quality peptide sequence matches (indicated
by the highest pep_score and lowest pep_expect values). These data were further
consolidated by removing peptide sequences identified in only one sample, to produce the
final list reported in Table 2.
Each Swiss-Prot and alpaca Mascot search included an automatic decoy database search for
estimation of the false discovery rate (FDR). These decoy searches use random sequences
(one for every ‘real’ sequence, of matching length) having the same average amino acid
composition as the ‘real’ sequence database. The average FDR for peptide matches above the
identity threshold was 2.6%. The FDR in the final data set of 735 peptides may be much
lower because only those peptides detected with statistically significant (p < 0.05) scores in at
least two independent samples were included.
1.9
Assignment of protein descriptions by BLAST search homology
Alpaca and camel EST sequences had no annotations. In order to identify matches from these
databases, homology searches were carried out using the NCBI BLASTP 2.2 program
(http://blast.ncbi.nlm.nih.gov/Blast.cgi) and non-redundant protein sequence database.
Protein descriptions from the best statistically significant matches were assigned to the
camelid sequences. Additionally, peptides identified from camel EST sequences were
subjected to short BLASTP searches, to check that both peptide and EST sequence could be
matched to the same protein description. These descriptions should be treated with caution
because they rely on the assumption that protein function is conserved between species, and
this may not always be the case.
1.10
Gene ontology (GO) term analysis
Human orthologs were available for 137 out of 147 identified proteins in camel urine.
UniProtKB (Swiss-Prot) identifiers for these human orthologs were used as the input to GO
term analysis using the AmiGO online tool (version 1.8, GO database release 2012-05-26) [1]
at http://amigo.geneontology.org/cgi-bin/amigo/term_enrichment. A background set of 12407
human gene products, (http://www.uniprot.org/docs/humpvar.txt release 2012_05 of 16 May
2012, accessed 29 May 2012) was used. A maximum P-value of 0.01 and a minimum of five
gene products were set as thresholds. Electronically inferred data was not included. The
results are shown in Table 3 and Figure 2. Results of GO term analysis should be viewed as
speculative because they rely on the assumption that protein function and location are
conserved between species, and this may not always be the case.
References
1. Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, AmiGO Hub, Web
Presence Working Group. AmiGO: online access to ontology and annotation data.
Bioinformatics. Jan 2009;25(2):288-9.
Legends to figures
Figure 2. Assignment of gene ontology (GO) terms to the proteins identified in camel urine:
(A) biological process and (B) molecular function.
Numbers of proteins (human orthologs) are shown in each statistically significant enriched
(compared to all human proteins) GO category. Only the 20 most highly populated GO
biological process terms are shown (out of 42).