Download Large-scale identification of cytosolic mouse brain proteins by

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Biochemical cascade wikipedia , lookup

Endogenous retrovirus wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Biochemistry wikipedia , lookup

Gel electrophoresis wikipedia , lookup

Point mutation wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Paracrine signalling wikipedia , lookup

Signal transduction wikipedia , lookup

Metalloprotein wikipedia , lookup

Gene expression wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

SR protein wikipedia , lookup

Expression vector wikipedia , lookup

Magnesium transporter wikipedia , lookup

Homology modeling wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

Protein wikipedia , lookup

Protein structure prediction wikipedia , lookup

Interactome wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein purification wikipedia , lookup

Western blot wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
2799
Electrophoresis 2006, 27, 2799–2813
Joo-Ho Shin1,
Kurt Krapfenbauer1, 2
Gert Lubec1
1
Department of Pediatrics,
Medical University of Vienna,
Vienna, Austria
2
Genomics and
Proteomics Department,
CNS Preclinical Research,
F. Hoffmann La Roche,
Basel, Switzerland
Received October 27, 2005
Revised March 6, 2006
Accepted March 8, 2006
Research Article
Large-scale identification of cytosolic
mouse brain proteins by chromatographic
prefractionation
Proteomic studies on mouse brain protein expression are still holding center stage as
the generation of a reference database for the brain proteome, a need for designing
expressional studies at the protein level. We therefore decided to extend the amount of
identified brain proteins by the use of prefractionation. In order to reduce the complexity of mouse brain proteome we applied chromatographic prefractionations, ionexchange and hydrophobic interaction chromatography, prior to 2-DE, followed by
mass spectrometric identification (2-DE MALDI-MS). We analyzed about 17 000 protein spots in cytosolic fractions of mouse brain and identified about 10 000 spots. A
total of 1841 proteins showing different pI or Mr, representing probably post-translational modifications or splice variants, were products of 789 different genes. Numerous
proteins were clearly identified as metabolic, antioxidant, cytoskeleton, signaling,
transcription/translation, nucleic acid-binding, proteolysis-related proteins. We additionally provided evidence for the existence of hypothetical proteins predicted from
nucleic acid sequences. Moreover, observed pIs of proteins are listed thus enabling
localization of proteins in a gel, information that cannot be obtained from theoretical pI’s
in databases. The results represent so far the largest database of mouse brain proteins
and provide valuable information for the design of proteomic studies in the mouse.
Keywords: Hydrophobic interaction chromatography / Ion-exchange chromatography
/ Mouse brain / Proteome map / Two-dimensional gel electrophoresis
DOI 10.1002/elps.200500804
1 Introduction
Initial mouse genome sequences were released in 2002 [1]
and subsequent comparative genomics and extensive
experiments have been updating gene numbers and transcriptomes in specific tissues and at different develop-
Correspondence: Professor Dr. Gert Lubec, Department of Pediatrics, Medical University of Vienna, Währinger Gürtel 18–20, A-1090
Vienna, Austria
E-mail: [email protected]
Fax: 143-1-40400-3194
Abbreviations: CDD, conserved domain database; COG, clusters of
orthologous groups of proteins database; DIP, database of interacting protein; DTE, dithioerythritol; HIC, hydrophobic interaction chromatography; HPs, hypothetical proteins; IEC, ion-exchange chromatography; PPI, protein–protein interaction; SMART, simple modular
architecture research tool
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
mental stages. The mouse genome contains about 30 000
genes, of which two-thirds are thought to be expressed in
the brain. To date 32 417 genes were submitted to Mouse
Genome Database (MGD, http://www.informatics.jax.org)
[2], 1930 genes are listed in the expression atlas (at the RNA
level) (http://brainatlas.org) and 1274 full-ORFs of transcripts expressed in the developing mouse nervous system
were identified by the National Institutes of Health (NIH)
Mouse Brain Molecular Anatomy Project (BMAP) [3]. The
gene map will be only the first step in mapping the mouse,
and high-throughput mRNA approaches for expression
profiling were applied due to technical reasons instead of
protein profiling. First initiatives were started to determine
mouse proteome using immunochemistry and protein
chemistry. The proteome encompasses all proteins that are
expressed in a biological system (cell, tissue, organ) at any
given time. The emerging field of proteomics focuses on the
large-scale, comprehensive characterization of proteome.
www.electrophoresis-journal.com
2800
J.-H. Shin et al.
The experimental procedure would include separating and
presenting all of the different proteins of an organism,
detecting protein functions by characterizing each protein
according to a broad spectrum of chemical and biological
parameters, and matching each protein with its gene [4].
Ultimately, proteomics was recognized as one of the main
directions of science in the postgenome era and protein
expression in all mammalian organs/cells will be mapped.
For this purpose the current analytical technologies and
methods will have to be improved and indeed a large series
of proteomic publications are identifying the same proteins
again and again. The many limitations in proteomic techniques including separation and identification of hydrophobic and membrane proteins, very basic and very acidic
proteins, proteins of high molecular weight, etc., are hampering construction of large maps [5, 6]. Therefore, prefractionation is mandatory to cope with the complexity of
brain proteins. Prefractionation of cellular compartments is
a useful principle [7] and chromatographic prefractionation
is another option [8–14]. The principle of the chromatographic step for prefractionation is to apply a third dimension in protein separation, in addition to the ones used in
2-DE, pI, and Mr and another objective is to allow loading
of a larger amount of fraction of the proteins and to confine
highly abundant proteins to one of the fractions [11].
It was the aim of this study to generate a mouse brain protein reference database by the use of DEAE-ion exchange
chromatography (IEC) and hydrophobic interaction chromatography (HIC) followed by 2-DE MALDI-MS analysis,
independent of antibody availability and specificity.
This database may serve for the design of protein expression studies in mouse brain identifying 1841 proteins that
were the products of 789 different genes covering several
protein classes and cascades. The many described proteins can be easily found on the maps and the neurochemist
is given the analytical information and is not left to find
experimental conditions for the proteomic analysis.
Electrophoresis 2006, 27, 2799–2813
2.2 Sample preparation
Four whole brain tissues were individually suspended in
sucrose buffer consisting of 20 mM HEPES (pH 7.5),
320 mM sucrose, 1 mM EDTA, 5 mM dithioerythritol (DTE),
and 1 mg/mL of a mixture of protease inhibitors (1 mM
PMSF and 1 tablet Complete™ (Art. No 1697498, Boehringer Mannheim) per 50 mL of suspension buffer and phosphatase inhibitors (0.2 mM Na3VO3, and 1 mM NaF)). The
suspension was homogenized with ten strokes using a
glass/Teflon homogenizer after Potter. The suspension was
centrifuged at 8006g for 10 min at 47C to sediment nonsuspended material. The supernatant was then taken for
further centrifugation at 10 0006g for 15 min to obtain the
enriched mitochondrial fraction, centrifuged at 100 0006g
for 1 h to generate a pellet containing the enriched microsomal fraction and the supernatant representing the cytosolic fraction. The supernatants were centrifuged at
100 0006g for 1 h to sediment undissolved material.
Cytosolic proteins from four mouse brains were desalted
and concentrated with HiTrap™ Desalting (Amersham Biosciences). The protein concentration in the supernatant
was determined by the Bradford reaction [15].
2.3 IEC
Proteins were applied onto an Anion-Exchanger column
(Art. Nr. 08802, DEAE, 5PW-Glass, 8.0 mm id67.5 cm.L,
Tosoh Biosep), equilibrated with 25 mM Tris buffer pH 8.8.
The column was washed twice with 20 mL of the same
buffer and 8 mg of total proteins from the cytosol (dissolved
in a sample buffer, pH 8.8, consisting: 25 mM Tris buffer at
pH 8.8) were eluted with a linear gradient of increasing salt
concentration from 0 to 2 M NaCl in 25 mM Tris-HCl buffer
(pH 8.8). Fractions of 0.5 mL were collected and pooled
according to the elution profile. Six pools (DFT, D1,4, and
DLF) were formed. Proteins were concentrated, desalted
by using RP chromatography (according to the manufacturer’s protocol Art. Nr. 1–1159–05, Poros50 R2,
Applied Biosystem) and analyzed by IEF and 2-D PAGE.
2 Materials and methods
2.4 HIC
2.1 Animals
Four FVB/N mice, about 3 months old, kept in a well-controlled (humidity, temperature, and light/dark cycle) environment with free access to food and water ad libitum,
were used for the experiments. Animals were sacrificed by
decapitation and whole brains were dissected at 2207C
and kept at 2807C until the time of analysis. The freezing
chain was never interrupted until use. Each mouse brain
was used independently for all the experiments and the
procedures were performed four times in all.
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Proteins were applied onto HIC columns (Tosoh Biosep:
Analytical/Semi-Prep column, TSK-Gel Phenyl-5PW, Art.
07573, 7.5 cm67.5 mm id, 10 mm), equilibrated with
25 mM Na2HPO4, (pH 7.0; 1 mM EDTA; 0.5 mM DTE; 16
Complete). The column was washed twice with 20 mL of
the same buffer (standard flow rate 0.5–1.0 mL/min, max
flow rate 1.2 mL/min, pH 7.0) and 8 mg total proteins from
the cytosol (dissolved in a 25 mM Na2HPO4 sample buffer)
were eluted with a linear gradient of decreasing salt concentration from 2 to 0 M ammonium sulfate solved in
www.electrophoresis-journal.com
Electrophoresis 2006, 27, 2799–2813
25 mM Na2HPO4, (pH 7.0; 1 mM EDTA; 0.5 mM DTE; 16
Complete). Fractions of 0.5 mL were collected and pooled
according to the elution profile. Five pools (HFT, H1,3,
and HLF) were formed. The proteins were concentrated
and desalted by using RP chromatography according to
the manufacture’s protocol (Art. Nr. 1–1159–05, Poros50
R2, Applied Biosystem) and analyzed by IEF and 2-D
PAGE described above.
2.5 2-DE
Eleven fractions from IEC (six pools) and HIC (five pools)
were desalted by using membrane filter tubes (Art. No.
UFV4BGC25, 10 000 NMWL, Biomax-10 membrane, Millipore, Bedford, MA, USA) and 1.0 mg aliquots were
applied on immobilized pH 3–10 nonlinear gradient strips
(Amersham Pharmacia Biotechnology, Uppsala, Sweden)
at both the basic and acidic ends of the strips. Proteins
were focused at 200 V after which the voltage was
gradually increased to 5000 V at the rate of 2 V/min
(approximately, 180 000 kVh). Focussing was continued
at 5000 V for 24 h. The second-dimensional separation
was performed on 12% homogeneous polyacrylamide
gels (Serva, Heidelberg, Germany). The gels
(180620061.5 mm) were run at 40 mA per gel, in an ISODALT apparatus (Hoefer Scientific Instruments, San
Francisco, CA, USA) accommodating ten gels. After protein fixation with 50% v/v methanol containing 5% v/v
phosphoric acid for 12 h, the gels were stained with colloidal CBB (Novex, San Diego, CA, USA) for 24 h. Molecular masses were determined by running standard protein
markers (Gibco, Basel, Switzerland), covering the range
of 10–200 kDa. pI values were used as given by supplier
of the IPG strips. Gels were destained with H2O and
scanned with ImageScanner (Amersham Biosciences,
Uppsala, Sweden). Images were processed using Photoshop (Adobe) and PowerPoint (Microsoft) software.
2.6 MALDI-TOF MS
MALDI-TOF MS analysis was performed as described in
[16] with minor modifications. Briefly, spots were excised
with a spot picker, placed into 96-well microtiter plates,
destained with 100 mL of 30% v/v ACN in 0.1 M ammonium bicarbonate and dried in a Speedvac evaporator. The
dried gel pieces were rehydrated with 5 mL of 3 mM TrisHCl (pH 8.8), containing 50 ng of trypsin (Promega, Madison, WI, USA). After 16 h digestion at room temperature
(RT), 5 mL of water was added and samples were kept at
RT for 10 min. Four microliters of 50% ACN, containing
0.3% TFA was added and the content was centrifuged for
1 min and vortexed for 20 min. The standard peptides desArg-bradykinin (Sigma, St. Louis, MO, USA; 904.4681 Da)
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Proteomics and 2-DE
2801
and adrenocorticotropic hormone fragment 18–39 (Sigma;
2465.1989 Da) were added. The application of the samples
was performed by a CyBi™-Well sample processor (CyBio
AG) 1.5 mL of the peptide mixture was simultaneously
applied with 1 mL of matrix, consisting of a saturated solution of a-cyano-4-hydroxycinnamic acid in 50% ACN containing 0.1% TFA. The samples were analyzed by a TOF
Reflex III™ mass spectrometer (Bruker Daltonics, Bremen,
Germany) equipped with a reflector and delayed extraction. An accelerating voltage of 20 kV was used. The protein search was performed with in-house software, developed by Hoffmann LaRoche (Basel, Switzerland), which is
similar to the Peptident software on the ExPASy server
(http://expasy.hcuge.ch/sprot/peptident.html) [17]. Monoisotopic peptide masses were compared to the theoretical
peptide masses of all available proteins from all species
from Swiss-Prot (http://www.expasy.ch) and PIR (http://
pir.georgetown.edu/) databases. A mass tolerance of
0.0025% was allowed. Unmatched peptides or miscleavage sites were not considered. Spectra were analyzed and protein sequence databases were searched
using the in-house programs Fragment 21 and MSROFIT,
respectively (Roche, Basel, Switzerland). The algorithm
used for the in-house software determining the probability
of a assignment (pMism, 210log(P), where P is the probability that the match is a random event) with a given MS
spectrum was published and this scoring function resulted
in ca. 0.02% false-positive hits [17]. The algorithm for
detection and evaluation of peptide mass peaks in MS
spectra is described below.
2.6.1 Baseline correction
The baseline of the MALDI-MS spectrum was found by
splitting the spectrum into sequential mass segments of a
0.05 mass range. For each of these segments we calculated a robust linear fit [18] and derived the slope and offset and their respective errors. The baseline was then
approximated by cubic spline interpolation in-between
the midpoints of the segments, and the baseline was
subtracted from the spectrum.
2.6.2 Peak detection and isotope distribution fit
After baseline correction, the maximum y-coordinate was
taken as the starting point for peak fitting. A standard
implementation of the Levenberg–Marquardt algorithm
[18] was used to fit the isotope distribution of an average
peptide at a given mass (calculated using the algorithm of
Rockwood et al. [19]) which is parametrisized by the
monoisotopic mass position, the instrument resolution,
and the peak height. The fit was characterized by the
usual fit quality estimates (chi-square statistics).
www.electrophoresis-journal.com
2802
J.-H. Shin et al.
2.6.3 Subtraction of fit
From the fitted isotope distribution parameters we calculated the fit isotope distribution multiplied by a safety margin
of 1.2 and subtracted the fit from the spectrum. The procedure was restarted and looped until the desired number of
monoisotopic massed has been found. The algorithm was
implemented on a standard personal computer.
2.7 Computational analysis of hypothetical
proteins
Based on MALDI-MS analysis, we identified hypothetical
proteins (HPs) and their sequences were compared
against the Swiss-Prot database (http://www.expasy.org/
sprot) [20], the conserved domain database (CDD) (http://
www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml)
[21],
Electrophoresis 2006, 27, 2799–2813
clusters of orthologous groups of proteins database
(COG) (http://www.ncbi.nlm.nih.gov/COG) [22] using their
respective search tools, simple modular architecture research tool (SMART) (http://smart.embl-heidelberg.de)
[23], protein families database (Pfam) (http://pfam.cgb.ki.se) [24], and against the NCBI protein database (http://
www.ncbi.nlm.nih.gov) [25]. STRING (http://www.
bork.embl-heidelberg.de/STRING) [26] provided clues for
functional prediction of HPs based on direct (physical)
and indirect (functional) associations derived from genomic context, high-throughput experiments, conserved
coexpression and previous knowledge. Protein–protein
interaction (PPI) data were obtained from the Database of
Interacting Proteins (DIP) (http://dip.doe-mbi.ucla.edu)
[27]. Finally, the recently enabled global search of all NCBI
database (http://www.ncbi.nlm.nih.gov/Entrez) was performed to ensure the complete retrieval of all the available
information. All findings were reassessed manually.
Figure 1. Scheme of procedure. Mouse brain cytosolic
protein was extracted and fractionated by ultracentrifugation
followed by chromatographic
prefractionation. Each fraction
was used for 2-DE as described
in Section 2.
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com
Proteomics and 2-DE
Electrophoresis 2006, 27, 2799–2813
3 Results
2803
Identification rate was about 57.4% (9613/16 741 spots) of
the analyzed spots. Nonidentification may have been due
to protein modifications, miscleavages, missing entries in
databases, errors in databases, unknown genes, or overlapping spots that did not allow confident identification.
3.1 Large-scale identification of mouse brain
proteins
Total protein from mouse brain was separated into three
enriched fractions (microsome, mitochondria, and cytosol) by ultracentrifugation (Fig. 1). Due to insufficient
amount of microsomal and mitochondrial proteins for further chromatographic prefractionations, these fractions
were kept and only the cytosolic fraction of the brain was
used for chromatographic prefractionation. Following
chromatography (Fig. 2a and b), fractions were desalted
and applied over a wide pH range 3–10 NL IPG strips.
2-DE separation was performed and protein spots were
visualized by staining with colloidal CBB (Fig. 2c and d,
see Supplementary Figure).
The spots in 2-DE gels were selected randomly with the
goal of detecting as many new proteins as possible and
PMF was used to identify visualized spots.
As summarized in Table 1, about 17 000 spots (IEC:
12 600 spots; HIC: 4400 spots) in 42 gels generated from
individual experiments were analyzed and resulted in the
identification of about 10 000 proteins. The fact that
numerous identified proteins were overlapping in between fractions led to the detection of 1841 proteins
showing different pI or Mr, probably due to post-translational modifications or presence of splice variants, which
were the products of 789 different genes (see Supplementary Table 1).
31.3% (247 of 789) of the identified proteins in mouse brain
were classified as metabolism enzymes or its subunits.
Chromatographic prefractionation enabled detection of
low-abundant proteins (LAPs) such as signaling components and, therefore, signaling proteins (136 of 789, 17.2%)
were observed as the second largest class. Other proteins
identified were assigned as cytoskeleton proteins (74 of
789, 9.4%), chaperone proteins (41 of 789, 5.2%), nucleicacid binding proteins (including transcriptional/translational factors) (46 of 789, 5.8%), antioxidant proteins (20 of
789, 2.5%), proteolysis components (47 of 789, 6.0%),
neuronal proteins (13 of 789, 1.6%), and miscellaneous
proteins (136 of 789, 17.2%) (Fig. 3a). In addition, representative proteins associating with neuronal or glial function were classified (see Supplementary Table 2).
In addition, we investigated the localization of genes
encoding the 789 different proteins. At least 19 genes
from X chromosome (2.4%, 19/789) were expressed in
mouse brain and genes on chromosome 11 were highly
expressed (9.6%, 76/789) (Fig. 3b).
Furthermore, theoretical values of pI and Mr were compared with observed values. Scatter-plot analysis showed
the confident correlation between pI/Mr values (coefficient R2 = 0.72 and 0.89, respectively) (Figs. 3c and d).
The existence of proteins represented by several spots
Table 1. Summary of spot numbers analyzed and identified in this study
Contents
Used micea)
F1
M1
M2
M3
Number of gels
IEC fraction
HIC fraction
6
3
4 777
2 714
56.8
6
5
4 621
2 662
57.6
6
5
4 498
2 581
57.4
6
5
2 845
1 656
58.2
16 741
9 613
2 403
789
1 841
2.3
Number of spots analyzed
Number of spots identified
Identification rate (%)
Total number of spots analyzed
Total number of spots identified
Identified spots per mouse
Total number of genes encoding proteins
Total number of spot showing different pI or Mr
Number of putative isoform, post-translational modified,
or fragmented spot per protein
Proteins fractionated by chromatography were separated by 2-DE and identified by MALDI-TOF MS, following in-gel
digestion with trypsin. The spots representing the identified proteins are indicated in Supplementary Figure 1 and are
designated with their accession numbers (http://us.expasy.org/). IEC, ion exchange chromatography; HIC, hydrophobic
interaction chromatography.
a) F1, Female mouse number 1; M1–3, Male mouse number 1–3.
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com
2804
J.-H. Shin et al.
Electrophoresis 2006, 27, 2799–2813
Figure 2. Representative elution profile of eluted proteins
from the IEC (a) and HIC (b) columns. Soluble protein fractions
were applied onto IEC and HIC
column and proteins were
eluted as described under
“experimental protocol.” Parts
of the elution profiles which
include protein peaks are
shown. Numbers below the
peaks indicate the pools formed
and followed by 2-DE. Under
monitoring at 280 nm UV detector, fraction DFT,DLT (DEAEchromatography)
(c)
and
HFT,HLF (HIC-chromatography) (d) were eluted with a
increasing and decreasing salt
concentration, respectively; FT,
flow through; LF, last fraction. (e)
All visualized spots were analyzed and 789 different gene
products were identified (see
Supplementary Table 1 and
Supplementary Figures). Most
proteins were predominantly
identified in IEC fraction and
only 48 proteins were exclusively identified in HIC.
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com
Electrophoresis 2006, 27, 2799–2813
Proteomics and 2-DE
2805
Table 2. Computational analysis and putative functions of HPs identified in mouse brain
Accession number
and protein name
(gene)
UniGene Clustera)
(Protein similarities%/number of amino acid)
Putative domainsb) PSI-BLASTc) search
(% identity to functionally known protein)
Cytoskeleton-related proteins
Q8BRR3
A. thaliana ref:NP_567496.1 – LET1 like protein (41.4/476)
Unknown EST, full
H. sapiens ref:NP_071938.1 – HP FLJ21988 (87.4/476)
insert sequence (Narfl) S. cerevisiae sp:P23503 – Nuclear architecture related protein 1 (31.5/417)
Pfam02906,
151% identity to
Fe_hyd_lg_C,
NARF (nuclear
iron only hydro- prelamin A recognigenase large
tion factor) protein
subunit,
(H. sapiens)
C-terminal domain.
Q922H6
HP BC008103
(BC008103)
A. thaliana ref:NP_195436.1 – tubulin-like protein (28.7/548)
D. melanogaster ref:NP_523435.1 – misato; lethal (29.8/549)
H. sapiens ref:NP_060586.1 – misato (H. sapiens) (78.2/554)
S. cerevisiae pir:S55093 – HP YMR211w (24.4/242)
Q9WV92
Band 4.1-like
protein 3 (4.1b)
(Epb41l3)
CD00836,
C. elegans ref:NP_493600.1 – Band 4.1 protein like (36.34/418)
FERM_C
D. melanogaster pir:T13800 – T13800 coracle gene protein (59.4/356)
M. musculus ref:NP_038841.1 – erythrocyte protein band 4.1-like 3; DAL1P
(88.1/510)
R. norvegicus ref:NP_446379.1 – erythrocyte protein band 4.1-like 3 (85.7/1087)
CD00286,
Tubulin/FtsZ,
Tubulin/FtsZ /
Cytoplasm
28% identity to
tubulin-like protein
48% identity to
neuronal protein 4.1
(M. musculus)
Neuronal protein 4.1
is necessary for
nuclear assembly
Detoxification-related proteins
Q8BZA2
A. thaliana pir:S57611 – S57611 probable quin oxidoreductase (34.6/322)
Hypothetical zincC. elegans ref:NP_496334.1 – Zinc-binding dehydrogenases (45.3/346)
containing alcohol
E. coli pir:D64897 – D64897 probable quinone oxidoreductase (37.8/345)
dehydrogenase super- M. musculus ref:NP_080244.1 – RIKEN cDNA 2510002C21 (40.9/322)
family containing
S. cerevisiae pir:S58197 – S58197 probable membrane protein YML131w
protein, full insert
(26.0/324)
sequence
(9630043F13Rik)
Pfam00107,
ADH_zinc_N,
zinc-binding
dehydrogenase.
45% identity to
zinc-binding
dehydrogenase
(C. elegans)
Q8CHP8
RIKEN cDNA
1700012G19
(1700012G19Rik)
A. thaliana ref:NP_199587.1 – 4-nitrophenylphosphatase-like protein (35.9/223)
C. elegans ref:NP_504511.1 – C53A3.2.p (34.6/204)
E. coli ref:NP_286389.1 – N-acetylglucosamine metabolism (33.2/189)
H. sapiens ref:NP_064711.1 – HP dJ37E16.5 (46.2/223)
S. cerevisiae pir:S67800 – S67800 aryl phosphatase (28.9/218)
Pfam00702,
hydrolase,
haloacid dehalogenase-like
hydrolase
45% identity to
pyridoxal phosphate
phosphatase
(M. musculus)
Q9CYW4
http://ca.expasy.org/
sprot/userman.html –
DE_line Hypothetical
Haloacid dehalogenase/epoxide hydrolase family containing
protein, full insert
sequence
(bn189g18.4.1)
(2810435D12)
A. thaliana ref:NP_199286.1 – Dreg-2 like protein (33/206)
C. elegans pir:T23197 T23197 HP K01G5.1 (28/208)
D. melanogaster sp:Q94915 – Rhythmically expressed gene 2 protein (28/229)
H. sapiens ref:NP_112496.1 – HP MGC12904 (77/249)
S. cerevisiae pir:S53060 – HP YMR130w (27/216)
Pfam00702,
Hydrolase,
haloacid dehalogenase-like
hydrolase
27% identity to
2-haloalkanoic acid
dehalogenase
Q9DCF2
Hypothetical HAD-like
structure containing
protein, full insert
sequence
(0610039H12)
A. thaliana ref:NP_199587.1 – 4-nitrophenylphosphatase-like protein (25.8/216)
C. elegans ref:NP_504597.1 – K08B12.3.p (51.6/251)
E. coli ref:NP_286389.1 – N-acetylglucosamine metabolism (24.3/220)
H. sapiens ref:NP_115500.1 – HP DKFZp564D1378 (89.2/259)
Pfam00702,
hydrolase,
haloacid dehalogenase-like
hydrolase
38% identity to
phospholysine
phosphohistidine
inorganic
pyrophosphate
phosphatase
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com
2806
J.-H. Shin et al.
Electrophoresis 2006, 27, 2799–2813
Table 2. Continued
Accession number
and protein name
(gene)
UniGene Clustera)
(Protein similarities%/number of amino acid)
Putative domainsb) PSI-BLASTc) search
(% identity to functionally known protein)
Metabolism-related proteins
Q6KAP8
MFLJ00216
protein [Fragment]
(mFLJ00216)
A. thaliana ref:NP_181047.1 – putative UDP-N-acetylglucosamine pyrophosphorylase (40.4/482)
C. elegans pir:T19764 – T19764 HP C36A4.4 (36.7/466)
H. sapiens sp:Q16222 – UDP-N-acetylhexosamine pyrophosphorylase (55.5/492)
S. cerevisiae pir:S50738 – QRI1 protein (38.0/477)
Pfam01704,
51% identity to UDPN-acteylglucosamine
UDPGP,
UTP-glucose-1- pyrophosphorylase 1
phosphate uridylyl- (H. sapiens)
transferase
Q8BGB7
9330118I12
product:E-1
ENZYME homolog
(9330118I12Rik)
C. elegans ref:NP_505997.1 – E-1 enzyme (38.9/267)
H. sapiens ref:NP_067027.1 – E-1 enzyme (100/261)
S. cerevisiae pir:S30843 – UTR4 protein (30.7/258)
COG4229,
39% identity to
COG4229, preenolase-phosphatase
dicted enolase(5M415) (C. elegans)
phosphatase
[Energy production
and conversion]
Q80TB3
MKIAA1612 protein
[Fragment]
(4930415J21Rik)
C. elegans ref:NP_495088.1 – C17G10.1.p (28.4/261)
H. sapiens ref:NP_060703.1 – HP FLJ10826 (73.3/362)
S. cerevisiae pir:S50552 – HP YER049w (32.0/150)
Smart00702,
P4Hc, prolyl
4-hydroxylase
alpha subunit
homologs
Q8BK10
Weakly similar to
Putative N-acetylglucosamine-6-phosphate deacetylase
(5730457F11Rik)
C. elegans ref:NP_498990.1 – N-acetyl-glucosamine-6-phosphate deacetylase
(52.4/395)
E. coli ref:NP_312042.1 – N-acetylgalactosamine-6-phosphate deacetylase
(36.6/381)
CD00854, NagA, 37% identity to
N-acetylglucosa- CaNAG2
mine-6-phosphate (Candida albicans)
deacetylase
Q8CAA7
PMMLP homolog
(Pgm2l1)
C. elegans ref:NP_499741.1 – Phosphoglucomutase and phosphomannomutase
phosphoserine (44.0/599)
H. sapiens ref:NP_060760.1 – HP FLJ10983 (60.2/592)
S. cerevisiae pir:S54585 – HP YMR278w (39.8/563)
Pfam02878 or
60% identity to
02879,
phosphoglucomutase
PGM_PMM, phos- 2 (H. sapiens)
phoglucomutase/
phosphomannomutase, alpha/
beta/alpha domain
Q8VCR7
CCG1-interacting
factor B (Cib)
C. elegans ref:NP_500027.1 – Y55F3AM.10.p (34.7/192)
H. sapiens ref:NP_116139.1 – HP MGC15429 (88.5/208)
COG0596, MhpC, 47% identity to Dorz1
predicted hydro- (M. musculus)
lases or acyltransferases
Q8VED9
RIKEN cDNA
1110067D22
(1110067D22Rik)
C. elegans ref:NP_501571.1 – galactoside-binding lectin like (27.7/134)
H. sapiens ref:NP_054900.1 – HSPC159 protein (98.8/172)
M. musculus sp:O54891 – Galectin-6 (37.1/132)
R. norvegicus pir:A55932 – Galectin-5 (35.5/137)
CD00070, GLECT, 35% identity to
galectin/galacto- galectin-8
se-binding lectin (H. sapiens)
Q9DB32
Unknown EST,
full insert sequence
(Haghl)
A. thaliana ref:NP_187696.1 – hydroxyacylglutathione hydrolase cytoplasmic
(41.5/256)
C. elegans ref:NP_496556.1 – Metallo-beta-lactamase superfamily (42.8/219)
E. coli ref:NP_285900.1 – probable hydroxyacylglutathione hydrolase (35.9/230)
H. sapiens ref:NP_115680.1 – HP MGC2605 (82.4/278)
M. musculus ref:NP_064383.1 – Brain protein 17 (42.3/226)
R. norvegicus ref:NP_203500.1 – Hydroxyacyl glutathione hydrolase;
glyoxalase II; round spermatid protein RSP29 (49.2/256)
S. cerevisiae pir:S70130 – HP YDR272w (29.4/226)
Pfam00753,
lactamase_B,
metallo-betalactamase superfamily COG0491,
GloB, Zn-dependent hydrolases,
including glyoxylases
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
30% identity to
Q822Y6,
glucose-6-phosphate
1-dehydrogenase
50% identity to
hydroxyacyl-glutathione hydrolase
(glyoxalase II)
(M. musculus)
www.electrophoresis-journal.com
Electrophoresis 2006, 27, 2799–2813
Proteomics and 2-DE
2807
Table 2. Continued
Accession number
and protein name
(gene)
UniGene Clustera)
(Protein similarities%/number of amino acid)
Proteolysis-related proteins
M. musculus ref:NP_077795.1 – HP, MGC: 7513; HP MGC7513 (100/103)
Q8BGR9
Weakly similar to
Putative ubiquitin
carboxyl-terminal
hydrolase C6G9.08
(BC002236)
Q9CUT0
H. sapiens ref:NP_078859.1 – HP FLJ23142 (77/217)
HP, full insert
sequence. [Fragment]
(4833415E20Rik)
Transcrition/translation-related proteins
A. thaliana ref:NP_176303.1 – HP (30.3/220)
Q8BNW9
C. elegans ref:NP_491322.1 – R12E2.1.p (33.7/160)
Protein KIAA0711
D. melanogaster pir:A45773 – A45773 kelch protein, long form (31.2/187)
(Kiaa0711)
M. musculus sp:P11087 – Collagen alpha 1 (26.4/670)
R. norvegicus ref:NP_037061.1 – Procollagen II alpha 1 (26.7/673)
S. cerevisiae pir:S48478 – Glucan 1,4-alpha –glucosidase (21.3/462)
H. sapiens ref:NP_065728.1 – HSCARG protein (82.7/306)
Q8BVF0
Similar to HSCARG
(M. musculus 10 days
neonate cerebellum
cDNA, RIKEN fulllength enriched library,
clone:6530440K01
product) (UN)
Putative domainsb) PSI-BLASTc) search
(% identity to functionally known protein)
CD00196,
UBQ, ubiquitin
homologs
Smart00577,
CPDc, catalytic
domain of CTDlike phosphatases;
36% identity to
ubiquitin-specific
protease 14
(tRNA-guanine
transglycosylase)
[Xenopus tropicalis]
Pfam03577,
peptidase_U34,
peptidase family
U34.
52% identity to
secernin 1
(H. sapiens)
Smart00225, BTB, 32% identity to DRE1
Broad-Complex, protein (H. sapiens)
Tramtrack and Bric
a brac; also known
as POZ (poxvirus
and zinc finger) domain. Smart00612,
Kelch, Kelch domain.
Pfam05368,
NmrA, NmrA-like
family
30% identity to
NADPH-dependent
reductase
Q8CDF9
Hypothetical NOL1/
NOP2/SUN family
containing protein,
full insert sequence
(4932443I04 Rik)
A. thaliana pir:T06106 – HP T5J1 7 170 (32.9/655)
C. elegans pir:T33803 – HP W07E6.1 (23.2/437)
E. coli sp:P76273 – HP yebU (27.2/291)
H. sapiens sp:P46087 – Proliferating-cell nucleolar antigen P120 (23.6/456)
M. musculus pir:A48998 – Nucleolar protein p120 (22.9/499)
S. cerevisiae sp:P38205 – Putative methyltransferase NCL1 (35.4/716)
Pfam01189,
Nol1_Nop2_Sun,
NOL1/NOP2/sun
family
24% identity to
Proliferating-cell
nucleolar antigen
P120 (H. sapiens)
Q9CQT1
Hypothetical Initiation
factor 2B containing
protein, full insert
sequence
(2410018C20 Rik)
A. thaliana ref:NP_027726.1 – Putative translation initiation factor eIF-2B alpha
subunit (58.3/345)
C. elegans ref:NP_506714.1 – Initiation factor 2 subunit (41.5/347)
D. melanogaster ref:NP_570020.1 – eIF2B-beta gene product (24/273)
H. sapiens pir:T08757 – Probable translation initiation factor eIF-2B delta chain –
(25.4/269)
Pfam01008,
IF-2B, initiation
factor 2 subunit
family
42% identity to
translation initiation
factor (40.9 kDa)
(5P563)
(C. elegans)
Signaling-related proteins
CD00197, VHS,
69% identity to TOM1
A. thaliana ref:NP_564138.1 – Expressed protein (32.5/288)
Q8C935
VHS domain has a (H. sapiens)
C. elegans ref:NP_508777.1 – C07A12.7a.p (47.1/355)
Inferred: TOM1
D. melanogaster pdb:1DVP – A Chain A, Crystal Structure Of The VHS And FYVE superhelical
(M. musculus),
full insert sequence Tandem Domains Of Hrs, A Protein Involved In Membrane Trafficking And Signal structure similar
(Tom1l2)
Transduction (35.4/143) H. sapiens ref:NP_005479.1 – Target of myb1 (61.7/507) to structure of ARM
repeats
M. musculus ref:NP_035752.1 – Target of myb1 homolog (60.9/507)
R. norvegicus ref:NP_062260.1 – Hrs (36.0/136)
S. cerevisiae pir:S48950 – HP YHR108w (24.0/379)
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com
2808
J.-H. Shin et al.
Electrophoresis 2006, 27, 2799–2813
Table 2. Continued
Accession number
and protein name
(gene)
UniGene Clustera)
(Protein similarities%/number of amino acid)
Q9D934 Similar to
PEFLIN
(2600002E23Rik)
CD00051, EFh,
35% identity to sorcin
A. thaliana ref:NP_180317.1 – putative calcium binding protein (31.0/166)
EF-hand, calcium (M. musculus)
C. elegans ref:NP_491447.1 – Calcium binding protein (46.5/142)
binding motif
D. melanogaster ref:NP_477047.1 – CalpA-P1 (27.1/129)
H. sapiens ref:NP_036524.1 – Peflin (84.2/275)
M. musculus pdb:1HQV – A Chain A, Structure Of Apoptosis-Linked Protein Alg-2
(39.6/183)
R. norvegicus pir:S38361 – Calpain (29.5/129)
S. cerevisiae sp:P53238 – Hypothetical 38.4 kDa protein in MUP1-SPR3 intergenic region (25.5/157)
No putative function
Q6LCE3, WF-3 (UN)
H. sapiens ref:NP_113638.1 – HP p5326 (96.8/251)
Q7TNC7
FKSG27
(AF322649)
C. elegans ref:NP_491232.1 – F53F10.5.(25.5/523)
E. coli ref:NP_415890.1 – putative membrane protein (22.07%/434 aa)
H. sapiens ref:NP_062558.1 – HP R30953_1 (28.1/398)
M. musculus sp:P05143 – prolin-rich protein MP-3 (31.4/183)
Q8BHQ7
Similar to Hypothetical 34.0 kDa
protein
(1810007P19Rik)
C. elegans ref:NP_497640.1 – R10F2.5.p (39.4/190)
Q8BZE2
H. sapiens pir:T08798 – HP DKFZp586B0923.1 – (96.1/308)
http://us.expasy.org/
sprot/userman.html –
DE_line Hypothetical
Cysteine-rich region
containing protein,
full insert sequence
(2510003E04Rik)
Q8BWR2
AD039
(1110049F12Rik)
A. thaliana ref:NP_565614.1 – expressed protein (37.6/194)
C. elegans pir:S44654 – ZK353.1 protein (48.2/190)
H. sapiens sp:O43396 – thioredoxin-like protein (31.9/134
R. norvegicus ref:NP_543163.1 – thioredoxin-like (33.3/134)
Q9DCS2
H. sapiens ref:NP_115742.1 – HP MGC13114 (86.4/66)
Hypothetical
S-adenosyl-lmethionine-dependent
methyltransferases
structure containing
protein, full insert
sequence
(0610011F06Rik)
Putative domainsb) PSI-BLASTc) search
(% identity to functionally known protein)
No putative conserved domains
No putative conserved domains
ND
No putative conserved domains
37% identity to, DNA
mismatch repair protein hexB (P14160)
No putative conserved domains
ND
28% identity to
interferon-inducible
GTPase 5 (H. sapiens)
Pfam06201,
32% identity to
DUF1000, Domain thioredoxin-like
of Unknown Func- protein (H. sapiens)
tion (DUF1000).
Pfam06080,
49% identity to SAMDUF938,
dependent methylProtein of
transferases
unknown
function (DUF938)
a) Comparison of sequences in UniGene (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene) with proteins supported by a complete genome. The alignments can suggest function of a gene. The nucleotide sequences in UniGene
are matched with possible translational products through sequence comparison using BLASTX with structural databases Swiss-Prot, PIR, PDB, or PRF (ProtEST, Protein matches for ESTs).
b) Accession number of conserved domain database (CDD, http://www.ncbi.nlm.nih.gov/Structure/cdd/), clusters of
orthologous groups of protein database (COG, http://www.ncbi.nlm.nih.gov/COG), simple modular architecture research tool (SMART, http://smart.embl-heidelberg.de/) or pfam-database (http://www.sanger.ac.uk/pfam).
c) PSI-BLAST was used in order to be able to arrange their function, without that they are even annotated in any database.
ND, not detected.
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com
Electrophoresis 2006, 27, 2799–2813
Proteomics and 2-DE
2809
Figure 3. Distribution of proteins identified considering their
function
(a),
chromosomal
localization (b). Metabolic and
signaling proteins accounted for
a main portion (48.5%) and
interestingly 76 proteins were
encoded by genes localized on
chromosome 11 (76/789, 9.6%).
(c) Scatter plot of theoretical pI
against observed pI in the identified protein with at least 1700
aligning values. Black line is the
linear regression line (R2 = 0.72).
Linear regression was performed with SigmaPlot (Windows version 4.00, SPSS Inc).
(d) Scatter plot of theoretical Mr
against observed Mr in the
identified protein with at least
1300 aligning values. Black line
is the linear regression line
(R2 = 0.89).
showing different pI accounted for the lower coefficient
for pI than Mr. Herein the observed pIs are measured thus
making possible the localization of the identified proteins
within the gel, a prerequisite for the use of this proteomic
approach as a valuable analytical tool.
3.2 Chromatographic prefractionation
Protein mixtures were fractionated by IEC because the
ion-exchanger has a high-protein-binding capacity and
can discriminate proteins with minor difference in their pI
values. In addition, HIC was used to increase the oppor© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
tunity for the separation of biological macromolecules
based on their surface hydrophobicity. The proteins were
adsorbed onto an uncharged matrix (synthetic polymer,
TSK Phenyl 5-PW column) carrying hydrophobic groups
in the presence of salts. IEC and HIC procedures were
previously evaluated as the prefractionation steps for human fetal brain and Haemophilus influenza, respectively
[8, 28]. IEC allowed identification of 543 proteins in human
fetal brain [8] and it was demonstrated that the HIC
approach enabled enrichment of low-copy-number gene
products [28–30]. In this study, many proteins were
repeatedly identified in both IEC and HIC fractions, and a
total of 420 and 47 proteins were exclusively identified in
www.electrophoresis-journal.com
2810
J.-H. Shin et al.
Electrophoresis 2006, 27, 2799–2813
IEC and HIC, respectively (Fig. 2e). Few proteins in HIC
were detected in the flow through and wash fractions,
whereas proteins of flow through in IEC were shown in the
elution profile and 2-DE gels.
High-abundant proteins such as actin or albumin were
successfully confined in certain fractions of IEC thus
allowing visualization of LAPs.
3.3 Computational annotation of HPs
We here experimentally show the presence of 29 HPs (29
of 789, 3.7%) that have never been reported so far at the
protein level and were simply predicted from nucleic acid
structure.
Conserved sequence motifs may provide a clue for putative function (sequence motif-to-function approach) [31]
and HPs showing high similarity to functionally known
proteins are possibly isoforms or homologs with known
function.
A close look at 29 HPs revealed that 23 HPs contained at
least some functional sequence motifs based primarily on
experimental characterization of their homologs in E. coli
and other organisms by SMART, CDD, COG, and Pfam
while others were somewhat vague, reflecting the presence of subtle sequence similarities to previously characterized proteins (Table 2).
According to gene ontology (GO) data and domains, eight
HPs (27.6%, 8 of 29) could be assigned to enzymatic
proteins and HPs involved in other functional classes
were annotated by computational analysis; signaling
proteins (2 of 29, 17.2%), cytoskeleton proteins (3 of 29,
9.4%), transcriptional/translational factors (4 of 29,
5.8%), detoxification proteins (4 of 29, 2.5%), proteolysis
components (2 of 29, 6.0%), and unpredictable proteins
without conserved domain (6 of 29, 17.2%) (Fig. 4a).
In order to reliably verify the putative function of six HPs
harboring no putative conserved domains, we investigated
putative PPIs of HPs at DIP database (http://dip.doe-mbi.
ucla.edu/) constructed by experimental evidences. A hypothetical cysteine-rich region, containing protein full insert
sequence (Q8BZE2), was only available in DIP database
and showed clear orthologs, e.g. Claret segregational protein (CG7831-PA) (GenBank, gi:17136354; Swiss-Prot,
NCD_DROME (D. melanogaster)) (Table 3). We might predict that Kinesin family member C5A (Q6PG90) (Mus musculus) (37% identity to NCD_DROME, 204/541 amino acid)
could be the binding partners of this HP in mouse system by
sequence similarity. Unfortunately, no available database
and programs enable to predict the 3-D structure, and
binding sites due to the absence of conserved sequence
alignment. The learning, observing, and outputting protein
patterns program (LOOPP, http://www.tc.cornell.edu/
CBIO/loopp) only showed the predicted secondary structure of hypothetical cystein-rich regions containing protein
(Q8BZE2) by sequence-structure alignment. Seventeen
helices and 21 coil structures were confidently designed by
potential optimization (Fig. 4b).
In the remaining structures, neither functional motif searches
nor PPI studies could give certain clues for putative function
of HPs. Searches against STRING, database integrating
interaction data from genomic context, high-throughput
experiments, conserved coexpression, and previous knowledge revealed that AD039 (Q8BWR2), FKSG27, hypothetical
S-adenosyl-L-methionine-dependent methyltransferases
structure containing protein, full insert sequence (Q9DCS2)
and similar to hypothetical 34.0 kDa protein (Q8BHQ7) confidently showed indirect functional, not physical, associations with sodium/potassium-transporting ATPase beta-2
chain (P14415, score = 0.414), monocarboxylate transporter
4 (P57787, score = 0.743), NADH-ubiquinone oxidoreductase MW (O35683, score = 0.433) and protein C21orf7
(P57077, score = 0.579) by conserved coexpression data,
respectively (data not shown).
Table 3. Putative interacting partners of HPs harboring no putative conserved domain
Protein name
Putative interactors
Orthologs of HP
(Species, E-value, GenBank) (DIP Nodea), NCBI)
Hypothetical cysteine-rich
region containing protein,
full insert sequence
(Q8BZE2)
CG14043-PA ORF
[Drosophila melanogaster]
(2e-27, gi:24581791)
Claret segregational protein
(CG7831-PA) (DIP:21302N,
gi:17136354, NCD_DROME
Mouse orthologs of putative
interactors (Identityb), Expect)
Experimentc)
(Reference)
Two hybrid test [37]
Identities = 204/541 (37%),
Expect = 1e-82 Kinesin
family member C5A [Kifc5a]
(M. musculus (mouse)) Q6PG90
a) A unique identifier for identifying each protein participating in a DIP interaction (DIP database, Database Interaction
Protein, http://dip.doe-mbi.ucla.edu/).
b) Comparison with similarity to known function.
c) Detail references are available in DIP database with DIP node.
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
www.electrophoresis-journal.com
Electrophoresis 2006, 27, 2799–2813
Proteomics and 2-DE
2811
Figure 4. (a) Classification of
HPs with putative function
annotated by computational
analysis. (b) Predicted secondary structure of hypothetical
cystein-rich region containing
protein (Q8BZE2) by searching
the LOOPP program alignment.
Each line indicates the query
sequence, predicted secondary
structure (H, helix; E, b-strand;
C, coil), and confidence level
(scale from 3 [low confidence] to
9 [high confidence]), respectively.
4 Discussion
A large series of proteins were identified in mouse brain
and form the so far largest published mouse brain protein
reference database. Other major findings include a comprehensive computation of observed pIs, a valuable analytical basis to find proteins as the theoretical, predicted
pIs cannot be used to determine the position of a protein
in a 2-DE gel. This is not in contradiction to the correlation
of theoretical with observed pI for the individual gene
products, the many protein expression forms resulting in
the molecular diversity of the proteome are representing
with different pIs and molecular weights. In addition, we
experimentally demonstrated the presence of 29 HPs that
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
have never been reported at the protein level and were so
far simply predicted from nucleic acid structure. These
can now be studied experimentally as they have been
shown to really exist.
Klose et al. [32] recently resolved 8767 protein spots by
large-gel 2-DE, mapped 665 proteins genetically and
identified 466 proteins by MALDI-TOF MS. The current
study extends this work by separating proteins from the
cytosolic fraction by two different chromatographic prefractionations prior to gel-based separation. Whereas
Klose et al. [32] used M. musculus strain C57BL/6 (B6)
and Mus spretus (SPR), we used FVB/N mice, which is a
strain widely used in transgenic mice thus offering a syswww.electrophoresis-journal.com
2812
J.-H. Shin et al.
tem suitable for most transgenic experiments and subsequent genetic analyses. Herein, all 42 2-DE gels were
used for identification and there is a strong overlap with
data from Klose et al. [32].
To date, the Swiss-Prot and TrEMBL database, for
example, indicate 9575 and 39355 proteins for M.
musculus. Although this number reflects considerable
redundancy due to inclusion of many fragments, splicing variants, precursors, and predicted gene products,
2-DE-based protein identification is still far from generating whole proteomes. The aim to generate a reference brain protein database that has to be continuously
upgraded by further improvements in proteomic technologies and providing information for the analysis of
proteins was met however.
Electrophoresis 2006, 27, 2799–2813
5 References
[1] Mouse Genome Sequencing Consortium, Nature 2002, 420,
520–562.
[2] Bult, C. J., Blake, J. A., Richardson, J. E., Kadin, J. A., Eppig,
J. T., Mouse Genome Database Group, Nucleic Acids Res.
2004, 32, D476–D481.
[3] Bonaldo, M. F., Bair, T. B., Scheetz, T. E., Snir, E. et al., Genome Res. 2004, 10B, 2053–2063.
[4] Fields, S., Science 2001, 291, 1221–1224.
[5] Lubec, G., Krapfenbauer, K., Fountoulakis, M., Prog. Neurobiol. 2003, 69, 193–211.
[6] Gorg, A., Weiss, W., Dunn, M. J., Proteomics 2004, 4, 3665–
3685.
[7] Krapfenbauer, K., Fountoulakis, M., Lubec, G., Electrophoresis 2003, 24, 1847–1870.
[8] Shin, J. H., Krapfenbauer, K., Lubec, G., Electrophoresis
2005, 26, 2759–2778.
[9] Josic, D., Brown, M. K., Huang, F., Callanan, H. et al., Electrophoresis 2005, 26, 2809–2822.
Recently, a number of studies have utilized the 2-DE
based proteomics approach to examine the mammalian
brain proteome. Several investigations have focused on
the identification of major proteins in normal brain tissue,
and on the construction of reference databases that catalog proteins expressed in the whole brain or brain sections: Reference brain proteome databases have been
established for the human fetal brain [8], whole rat brain
[7], human/mouse/rat hippocampus [33–35], and mouse
neurons and astrocytes [36].
Although numerous proteins can be analyzed using 2DE based methodology, there are still a series of limitations for its use. The proteomic technique used herein
only allowed identification of highly soluble, hydrophilic
proteins, and only a limited number of very acidic or
basic proteins were revealed. Identification of 789 individual proteins, however, represents the largest brain
protein reference database published so far that forms
the basis for design of brain protein expression studies.
The molecular diversity is shown as well as a protein
chemical analytical tool for fair analysis independent of
antibody availability and specificity. Results from this
communication may represent the basis for studies on
proteins identified herein and complement immunochemical studies in future protocols, warranting protein
chemical demonstration of proteins. We are continuing
work on the proteome of the mouse brain in our laboratory by extending chemical prefractionation protocols
and advanced instrumentation to further increase this
reference database.
[10] Righetti, P. G., Castagna, A., Antonioli, P., Boschetti, E.,
Electrophoresis 2005, 26, 297–319.
[11] Lescuyer, P., Hochstrasser, D. F., Sanchez, J. C., Electrophoresis 2004, 25, 1125–1135.
[12] Birch, R. M., O’Byrne, C., Booth, I. R., Cash, P., Proteomics
2003, 3, 764–776.
[13] van den Bergh, G., Clerens, S., Vandesande, F., Arckens, L.,
Electrophoresis 2003, 24, 1471–1481.
[14] Badock, V., Steinhusen, U., Bommert, K., Otto, A., Electrophoresis 2001, 22, 2856–2864.
[15] Bradford, M. M., Anal. Biochem. 1976, 72, 248–254.
[16] Fountoulakis, M., Langen, H., Anal. Biochem. 1997, 250,
153–156.
[17] Berndt, P., Hobohm, U., Langen, H., Electrophoresis 1999,
20, 3521–3526.
[18] Press, W. M., Teukolsky, S. A., Vetterling, W. T., Flannery, B.
P., Scientist 1986, 1, 23.
[19] Rockwood, S. L., van Orden, R. D., Smith, G., Anal. Chem.
1995, 67, 2699–2704.
[20] Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M. C. et
al., Nucleic Acids Res. 2003, 31, 365–370.
[21] Marchler-Bauer, A., Anderson, J. B., DeWeese-Scott, C.,
Fedorova, N. D. et al., Nucleic Acids Res. 2003, 31, 383–
387.
[22] Tatusov, R. L., Galperin, M. Y., Natale, D. A., Koonin, E. V.,
Nucleic Acids Res. 2000, 28, 33–36.
[23] Schultz, J., Copley, R. R., Doerks, T., Ponting, C. P., Bork, P.,
Nucleic Acids Res. 2000, 28, 231–234.
[24] Bateman, A., Birney, E., Durbin, R., Eddy, S. R. et al., Nucleic
Acids Res. 2000, 28, 263–266.
[25] Wheeler, D. L., Barrett, T., Benson, D. A., Bryant, S. H. et al.,
Nucleic Acids Res. 2005, 33, D39–D45.
[26] von Mering, C., Jensen, L. J., Snel, B., Hooper, S. D. et al.,
Nucleic Acids Res. 2005, 33, D433–D437.
[27] Xenarios, I., Salwinski, L., Duan, X. J., Higney, P. et al.,
Nucleic Acids Res. 2002, 30, 303–305.
We appreciate the contribution of the Verein zur Durchführung der Wissenschaftlichen Forschung auf dem Gebiet
der Neonatologie und Kinderintensivmedizin “Unser Kind.”
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
[28] Fountoulakis, M., Takacs, M. F., Takacs, B., J. Chromatogr.
1999, A833, 157–168.
[29] Foyn Bruun, C., J. Chromatogr. B 2003, 790, 355–363.
www.electrophoresis-journal.com
Electrophoresis 2006, 27, 2799–2813
[30] Langen, H., Takacs, B., Evers, S., Berndt, P. et al., Electrophoresis 2000, 21, 411–429.
[31] Koonin, E. V., Galperin, M. Y., Sequence-Evolution-Function.
Computational Approaches in Comparative Genomics,
Kluwer Academic, Boston, MA 2002.
[32] Klose, J., Nock, C., Herrmann, M., Stuhler, K. et al., Nat.
Genet. 2002, 30, 385–393.
[33] Yang, J. W., Czech, T., Lubec, G., Electrophoresis 2004, 25,
1169–1174.
© 2006 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Proteomics and 2-DE
2813
[34] Shin, J. -H., London, J., Le Pecheur, M., Weitzdoerfer, R. et
al., Neurochem. Int. 2005, 46, 641–653.
[35] Fountoulakis, M., Tsangaris, G. T., Maris, A., Lubec, G., J.
Chromatogr. B 2005, 819, 115–129.
[36] Yang, J. W., Rodrigo, R., Felipo, V., Lubec, G., J. Proteome
Res. 2005, 4, 768–788.
[37] Giot, L., Bader, J. S., Brouwer, C., Chaudhuri, A. et al., Science 2003, 302, 1727–1736.
www.electrophoresis-journal.com