Download Different packing of external residues can explain differences in the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Evolution of metal ions in biological systems wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biochemical cascade wikipedia , lookup

Biosynthesis wikipedia , lookup

Gene expression wikipedia , lookup

Point mutation wikipedia , lookup

Paracrine signalling wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Signal transduction wikipedia , lookup

Expression vector wikipedia , lookup

Magnesium transporter wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Genetic code wikipedia , lookup

Metabolism wikipedia , lookup

Homology modeling wikipedia , lookup

Bimolecular fluorescence complementation wikipedia , lookup

SR protein wikipedia , lookup

Protein purification wikipedia , lookup

Interactome wikipedia , lookup

Biochemistry wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Metalloprotein wikipedia , lookup

Western blot wikipedia , lookup

Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup

Protein wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
BIOINFORMATICS
ORIGINAL PAPER
Vol. 23 no. 17 2007, pages 2231–2238
doi:10.1093/bioinformatics/btm345
Structural bioinformatics
Different packing of external residues can explain differences in
the thermostability of proteins from thermophilic and mesophilic
organisms
Anna V. Glyakina1, Sergiy O. Garbuzynskiy2, Michail Yu. Lobanov2 and
Oxana V. Galzitskaya2,*
1
Institute of Mathematical Problems of Biology and 2Institute of Protein Research, Russian Academy of Sciences,
Pushchino, Moscow Region 142290, Russia
Received on April 24, 2007; revised and accepted on June 25, 2007
Advance Access publication June 28, 2007
Associate Editor: Burkhard Rost
ABSTRACT
Motivation: Understanding the basis of protein stability in thermophilic organisms raises a general question: what structural properties
of proteins are responsible for the higher thermostability of proteins
from thermophilic organisms compared to proteins from mesophilic
organisms?
Results: A unique database of 373 structurally well-aligned protein
pairs from thermophilic and mesophilic organisms is constructed.
Comparison of proteins from thermophilic and mesophilic organisms
has shown that the external, water-accessible residues of the first
group are more closely packed than those of the second. Packing of
interior parts of proteins (residues inaccessible to water molecules) is
the same in both cases. The analysis of amino acid composition of
external residues of proteins from thermophilic organisms revealed
an increased fraction of such amino acids as Lys, Arg and Glu, and
a decreased fraction of Ala, Asp, Asn, Gln, Thr, Ser and His. Our
theoretical investigation of folding/unfolding behavior confirms the
experimental observations that the interactions that differ in
thermophilic and mesophilic proteins form only after the passing of
the transition state during folding. Thus, different packing of external
residues can explain differences in thermostability of proteins from
thermophilic and mesophilic organisms.
Availability: The database of 373 structurally well-aligned protein
pairs is available at http://phys.protres.ru/resources/termo_
meso_base.html
Contact: [email protected]
Supplementary information: Supplementary data are available at
Bioinformatics online.
1
INTRODUCTION
The importance of the various factors that contribute to a
protein’s thermostability is the subject of intense study.
Hydrogen bonding and the hydrophobic effect are the major
stabilizing forces, but their relative contributions are still
debated (Dill, 1990; Honig, 1999). In relation to this
*To whom correspondence should be addressed.
uncertainty, a question of great interest is how proteins from
thermophilic organisms retain their structure and function at
high temperature.
The study of protein stability can help explore the sequencestructure stability relationship (Kumar et al., 2001; Olofsson
et al., 2007; Perl et al., 1998; Razvi and Scholtz, 2006; Schuler
et al., 2002). It has been shown that no correlations exist
between a protein’s melting temperature and parameters, such
as change in heat capacity, change in accessible surface area
upon folding and number of residues in the protein (Kumar
et al., 2001). Experimental data to show this was obtained
from studies that included nine proteins from thermophiles
and 10 proteins from mesophiles, all of which show reversible
two-state folding/unfolding kinetics. However, the change in
heat capacity correlates both to the difference in surface
area accessible to the solvent between unfolded and native
states and to the number of residues in the protein (Kumar
et al., 2001). The authors drew the conclusion that higher
thermostability was achieved by specific interactions,
particularly electrostatic ones, which is supported by an
increased enthalpy change at the melting temperature (Kumar
et al., 2001).
In a recent review, thermodynamic data from 26 homologous
proteins obtained from thermophiles and mesophiles are
compared (Razvi and Scholtz, 2006). The authors made an
attempt to classify the proteins based on three different
(Nojima et al., 1977) approaches to modulation of the protein
stability curve in order to achieve higher thermostability. These
three approaches are: (1) increasing the value of H without
compensating for changes in S, which will result in a similar
stability curve, but with higher G values at all temperatures;
(2) reducing Cp, which results in a broadened stability curve
Cp and (3) lowering the change in entropy for the folding
transition, which increases the temperature of maximum
stability. All three ways of achieving higher thermostability
have been observed in nature, sometimes independently and in
other cases in combination. Moreover, most proteins use
different combinations of these three general approaches, and
so a simple classification of proteins into three separate groups
was not possible (Razvi and Scholtz, 2006).
ß The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]
2231
A.V.Glyakina et al.
It has been shown that the folding rates of cold shock
proteins from Thermotoga maritima (thermophilic organism)
and Bacillus subtilis (mesophilic organism) are similar, while the
unfolding rate of the thermophilic protein is two orders lower
than that of its mesophilic homologue (Perl et al., 1998; Schuler
et al., 2002). Thus, distinctions in stability arise from
distinctions in the unfolding rate constants of thermophilic
and mesophilic proteins. For thermophilic proteins, the
activation barrier of unfolding increases (Schuler et al., 2002).
Stabilization of the native state by the mutation of amino acid
residues situated on the protein surface is one possible
explanation for this phenomenon. These mutations lead to a
large number of enthalpic interactions that form only after
overcoming the free-energy barrier in the course of folding
(Schuler et al., 2002).
Recently, a similar result has been demonstrated for the
folding of S6 ribosomal proteins from the thermophilic
bacterium Thermus thermophilus and from the hyperthermophilic bacterium Aquifex aeolicus (Olofsson et al., 2007). The
folding rate constants for these proteins are identical, but the
unfolding rate for the protein from the hyperthermophilic
organism is by an order of magnitude slower than that for the
protein from the thermophilic organism (Olofsson et al., 2007).
Along with experimental works, theoretical studies of
sequences and 3D structures of proteins provide another way
to find correlations between structural characteristics and
stabilizing forces (Berezovsky and Shakhnovich, 2005;
Berezovsky et al., 2005; Fukuchi and Nishikawa, 2001;
Liang et al., 2005; Robinson-Rechavi et al., 2006;
Zeldovich et al., 2007).
Two physical mechanisms for the increase of protein
thermostability are suggested (Berezovsky and Shakhnovich,
2005). One of these mechanisms relates to structural factors
(homologous thermophilic and mesophilic proteins have
different structures, and the compactness of thermophilic
protein is greater), and the other is concerned with essential
modifications of amino acid sequences of proteins (homologous
thermophilic and mesophilic proteins have similar structures,
but different sequences). It was shown that the mechanism
chosen to increase stability depends on the evolutionary history
of an organism (Berezovsky and Shakhnovich, 2005). The
proteins from those organisms that originated in an extremely
hot environment are more compact, and their stability increase
is provided by structural factors. In contrast, those organisms
that evolved as mesophiles but later recolonized in a hot
environment have proteins that were thermostabilized by the
modification of amino acid sequences (Berezovsky and
Shakhnovich, 2005).
Since amino acid composition plays an important role in
thermostabilization, the sequences of thermophilic and mesophilic proteins have been studied (Berezovsky et al., 2005;
Fukuchi and Nishikawa, 2001; Zeldovich et al., 2007). In the
first case, the amino acid composition of the surface, interior
and entire amino acid chain of 279 proteins from thermophilic
and mesophilic bacteria with known spatial structures were
analyzed, which demonstrated that polar residues are scarce
and charged residues are abundant in thermophilic proteins
(Fukuchi and Nishikawa, 2001). This difference is most
apparent from the surface composition rather than that of the
2232
interior (Fukuchi and Nishikawa, 2001). Different sets of
scarce and abundant amino acid types have been obtained
for different databases (Kumar et al., 2000; Szilagyi and
Zavodszky, 2000).
A set of amino acid residues whose total fractions in the
proteomes are correlated with optimal growth temperatures has
been recently found (Ile, Val, Tyr, Trp, Arg, Glu and Leu)
(Zeldovich et al., 2007).
An interesting result was obtained upon a theoretical
investigation of the mesophilic and thermophilic hydrolase H
proteins from Escherichia coli (mesophilic organism) and
T.thermophilus (thermophilic) (Berezovsky et al., 2005).
Although Lys and Arg are chemically similar amino acid
residues, it was shown that Lys has a much greater number of
accessible rotamers in the folded state than Arg has. To
demonstrate the stabilizing role of lysine residues, the authors
replaced Arg with Lys (in silico) and analyzed the unfolding
simulations. These simulations have shown that the modified
structure was more stable. Consequently, Lys (in contrast to
Arg) stabilizes the native state of the protein, preferentially
entropically (Berezovsky et al., 2005).
Analysis of a database consisting of 94 homologous pairs of
protein structures from thermophilic and mesophilic organisms
demonstrates that thermophilic proteins are on average shorter
than mesophilic proteins, and that thermophilic proteins
contain a larger number of salt bridges per residue (0.0401
versus 0.0322, p ¼ 104), a larger number of hydrogen bonds
per residue (1.42 versus 1.40, p ¼ 102), a smaller fraction of
amino acid residues that are not involved in regular secondary
structure (0.370 versus 0.382, p ¼ 102), and a smaller ratio of
the accessible surface area to the surface of a sphere of the same
volume as the protein (2.38 versus 2.43, p ¼ 104) (RobinsonRechavi et al., 2006). (p is the probability that the two
distributions have the same average values, as calculated by
Student’s paired t-test. If this probability approaches zero, then
these two average values are different, and if it approaches
unity, they are identical.)
Although it has been shown that -cationic interactions (i.e.
interactions between an aromatic side chain (Phe, Tyr or Trp)
and a cationic side chain (Lys or Arg) (Gallivan and
Dougherty, 1999) play an important role in increasing the
thermostability of proteins (Chakravarty and Varadarajan,
2002), the difference in this parameter between thermophiles
and mesophiles is not reliable (0.00984 for thermophiles versus
0.00749 for mesophiles, p ¼ 102) (Robinson-Rechavi et al.,
2006). There also is not a high correlation between the number
of salt bridges and -cationic interaction (correlation coefficient: 0.21). The authors of the article (Robinson-Rechavi et al.,
2006) also demonstrated that -cationic interactions play a
smaller role (compared to salt bridges) in the stabilization of
thermophilic proteins (Robinson-Rechavi et al., 2006). On the
other hand, there is no correlation between changes in
properties that describe compactness and in electrostatic
interactions (Robinson-Rechavi et al., 2006).
Differences in residue packing between inner and outer
residues of 20 pairs of thermophilic and mesophilic proteins
were explored by Pack and Yoo (2005). They found that
exposed residues do not have a distinct difference on residue
packing (in the Results section we will discuss this observation).
Different packing of external residues
Without a doubt, the 3D structure of any protein is
determined by a delicate balance of different interactions.
Hydrogen bonds, salt bridges, the hydrophobic effect play roles
in the folding of a protein and the establishment of its final
structure as well as binding multi-valent ions and a large
number of prolines (Barlow and Thornton, 1983; Kumar et al.,
2000; Szilagyi and Zavodszky, 2000). However, on the basis of
the known facts, it is difficult to answer the question of which
type of interaction is dominant in the maintenance of the native
structure of proteins from thermophilic organisms. Despite the
fact that there are many works devoted to the search for
differences between thermophilic and mesophilic proteins, this
question is not solved. Moreover, in some of the works the
authors compare only one pair of proteins (one thermophilic
protein and one mesophilic protein) or pairs of proteins without
any definite criterion for homology or all proteins (both
homologous or non-homologous) in the proteomes. In this
work, we have created a unique database of proteins in order to
look for differences between structurally well-aligned proteins
from thermophilic and mesophilic organisms.
2
2.1
METHODS
Creation of the database
To search for structural differences between thermophilic and
mesophilic proteins, we constructed a database of pairs of homologous
proteins (one protein in each pair was from a thermophilic organism
while the other protein was from a mesophilic organism). This database
was created in the following way. In September 2005, all available 3D
structures of proteins were taken from Protein Data Bank (PDB)
(Berman et al., 2000). For each protein chain, the organism which was
the initial source of the protein (usually its name is in the
ORGANISM_SCIENTIFIC record corresponding to this chain in the
PDB file) was determined. From literature data, we determined which
organisms are thermophilic. Pairwise local alignments for all thermophilic chains with each other were made with the help of the program
BLAST (Basic Local Alignment Search Tool) (Altschul et al., 1990) in
order to find homologous sequences. The homologous sequences of
thermophiles (BLAST E-value5105) were joined into clusters. Then,
the sequences of mesophiles which were homologous to at least one
thermophilic sequence in a cluster were added to the corresponding
clusters. Then, all proteins in a cluster were structurally aligned by the
program AligProf (M.Y. Lobanov, unpublished data), and the MaxSub
value (this value determines the quality of 3D alignment of structures)
(Siew et al., 2000) was obtained for each pair of proteins. The MaxSub
value was calculated as:
P
1
MaxSub ¼
i
1þðdi =3:5Þ2
minðN1 ; N2 Þ
100%
ð1Þ
where N1 and N2 were numbers of residues in the proteins in the pair.
The summation was made for all pairs of amino acids for which the
distance between C-atoms di (after superposition) was 53.5 Å. If the
distance between C-atoms was more than 3.5 Å, the residues were
considered not to be aligned. If the aligned fragments were less than
four residues, these fragments were also ignored. The MaxSub value
changes within the limits of 0%5MaxSub5100%; if MaxSub ¼ 100%,
it means that the examined structures completely coincide. The
thermophile–mesophile pair with the largest MaxSub value was selected
from each cluster.
To these best aligned pairs, the following criteria were applied:
(1) multidomain proteins were divided into domains; (2) the length of
Fig. 1. Structural alignment (with the help of the program AligProf)
of a pair of proteins from thermophilic and mesophilic organisms:
1 (gray curve)—protein GroES from a thermophilic organism
T.thermophilus (PDB entry 1we3; chain O; residues 6 – 100); 2 (black
curve)—protein GroES from a mesophilic organism E.coli (PDB entry
1pcq; chain O; residues 1 – 96).
a domain was not more than 400 amino acid residues; (3) if the N- or
C-end of one of the proteins in the pair were longer than in another one,
they were cut; (4) the difference in the length between proteins in any
pair was not more than 10% from the length of the shortest protein;
(5) the number of residues without 3D coordinates did not exceed
10% of the total number of residues in the protein; (6) split domains
(i.e. domains consisting of two or more separate regions of the chain)
were excluded and (7) the MaxSub value for each thermophile–
mesophile pair was more than 70%. As a result, 373 pairs of
structurally well-aligned proteins (one of the pairs is presented in
Fig. 1) were selected.
2.2
Calculation of structural parameters
For each amino acid residue in the database, the type of secondary
structure, number of hydrogen bonds, and surface area accessible to
the solvent were obtained with the help of the program DSSP
(Definition of Secondary Structure of Proteins) (Kabsch and
Sander, 1983).
A residue was called internal if the area of its surface accessible to the
solvent was equal to zero, and external if its accessible surface area was
more than 25% of the maximal accessible surface area observed in PDB
for this type of residue (see Table 1). The other amino acid residues
were called intermediate residues (Fukuchi and Nishikawa, 2001).
The calculation of the number of atom–atom contacts per residue in
proteins was carried out in the following way: two atoms were
considered in contact with each other if their centers were at a distance
of 56 Å (or 8 Å). The atom–atom contacts between adjacent residues as
well as within one residue were not taken into account. Then, the total
number of atom–atom contacts in a protein was divided by the number
of residues in the protein.
The frequency of the occurrence of various types of amino acid
residues among interior and exterior residues was determined as the
ratio of the number of the residues of each type occurring among
exterior (interior), to the total number of the exterior (interior) residues.
The energy of hydrogen bonds was calculated with the help of the
program DSSP. According to Kabsch and Sander (1983), a hydrogen
bond exists if its energy is lower than 0.5 kcal/mol. A hydrogen bond
was considered exterior (interior), if its donor and acceptor both belong
to the exterior (interior) residues. If a donor belongs to the exterior
(interior) residue and an acceptor to the interior (exterior) residue, then
the hydrogen bond was considered neither interior nor exterior.
2233
A.V.Glyakina et al.
Table 1. Maximum accessible surface area (ASA) of 20 types of amino acid residues, as observed in PDB*
Type of residue
ASA, Å2
Type of residue
ASA, Å2
Gly
104
Arg
278
Ala
159
Thr
186
Pro
183
Val
164
Glu
215
Ile
190
Gln
237
Leu
212
Asp
189
Met
204
Asn
188
Phe
237
Ser
176
Tyr
250
His
209
Cys
144
Lys
231
Trp
160
For each amino acid in proteins from PDB, accessible surface area was calculated with the help of the program DSSP and maximal accessible surface area for each of 20
types of amino acid residues was chosen. We did not consider amino acid residues situated on the termini of a protein and at the ends of breaks in 3D structures.
We consider a salt bridge to exist if the distance between oppositely
charged side groups does not exceed 4 Å (Barlow and Thornton, 1983;
Kumar et al., 2000). We also calculated salt-bridge contacts for 3 Å and
the results were the same (data not shown).
Numerical values of all the structural features considered in this work
were calculated for all proteins in the database and compared
(thermophiles versus mesophiles) by Student’s pairwise t-test.
2.3
Analysis of protein folding
For investigation of protein folding and unfolding behavior, we used
the previously described (Galzitskaya and Finkelstein, 1999;
Galzitskaya et al., 2005; Garbuzynskiy et al., 2004, 2005) algorithm
of analysis of a network of protein folding/unfolding pathways.
The method is based on investigation of pathways of stepwise
reversible unfolding of a known 3D structure of the native state
(taken from PDB) of the protein. Each step on each pathway is
reversible and consists of a removal of one ‘chain link’ (a chain
fragment consisting of a few amino acid residues) from the 3D
structure. In this work, the size of the chain link was set to five residues
(except for the C-terminal link which was smaller if the given protein
size was non-divisible by five), which is the optimal value of this
parameter (Garbuzynskiy et al., 2004). The removed links are assumed
to form a coil: they lose all their native interactions but gain the entropy
of a coil. In addition, if an unfolded loop with both ends fixed by the
remaining structured part arises in the structure, some entropy is spent
on fixation of these ends.
Considering all possible pathways of protein reversible unfolding,
a network of protein folding/unfolding pathways is obtained. For
each of the obtained structures, the free energy is calculated as follows:
!
X
FI ¼ " nI T I þ
ð2Þ
Sloop
loops 2 I
where nI is the number of native contacts (these are atom–atom contacts
at a distance 6 Å) in the folded part of structure I, I is the number of
residues in the unfolded part of the structure, " is the energy of one
contact (all contacts are assumed to be equal), is the difference in
entropy between the coil and the native state of a residue [we take
¼ 2.3R (Privalov, 1979) where R is the gas constant], T is temperature
P
and loops 2 I Sloop is the entropy spent on fixation of the ends of all
closed unfolded loops existing in structure I:
5
3 Sloop ¼ R lnjk lj R r2kl a2 =ð2 Aajk ljÞ
2
2
ð3Þ
where rkl is the distance between the C atoms of residues k and l,
a ¼ 3.8 Å is the distance between the neighbor C atoms in the chain,
and A is the persistence length for a polypeptide (we take A ¼ 20 Å)
(Flory, 1969).
We model protein folding and unfolding processes in the midtransition point (i.e. at the melting temperature) of the given protein.
At the point of mid-transition, only two states are observed (native
state and totally unfolded state) while the other possible structures
2234
(partially folded and misfolded) are destabilized; thus, the folding
process is in the simplest form (Finkelstein and Badretdinov, 1997). The
free energies of the native state and the totally unfolded state are equal,
i.e. the average contact energy " is
" ¼ Tm N=n0
ð4Þ
where Tm is the melting temperature of the protein, n0 is the number of
contacts in the native structure and N is the total number of the protein
chain residues.
Thus, a network of protein folding/unfolding pathways on a freeenergy landscape is obtained. Further, on each of the pathways,
we search (using a dynamic programming method) (Aho et al.,
1976; Finkelstein and Roytberg, 1993) for a free-energy maximum
(i.e. the transition state of this pathway; the part of the molecule
which is structured in the transition state is a folding nucleus).
Then, the free energies of all pathways are compared and the pathway
with the minimum free energy of the transition state Fmin is
revealed (this is the optimal pathway since it has the lowest freeenergy barrier).
Considering all possible pathways, we thus obtain the ensemble of
transition states (each of which possesses its own free-energy) and the
ensemble of folding nuclei.
The effective height of the free energy barrier was calculated as
X
expðFI# =RTÞ
ð5Þ
Feff ¼ RT ln
I#
F #I
where
is the free energy of a single transition state, and the
summation is over all transition states.
The average number of residues in the folding nucleus
ensemble was also calculated according to the free energy of the
transition state. The value of the Boltzmann probability of a transition
state I # is
X
expðFI #=RTÞ
ð6Þ
PI # ¼ expðFI #=RTÞ=
I#
The effective size of the folding nucleus for a protein was calculated as
follows:
X
LI # PI #
ð7Þ
Leff ¼
I#
where L#I
and PI #
is the number of folded residues in a transition state I #,
is the Boltzmann probability of I# in the transition state
ensemble.
The degree of involvement of a residue in the folding nucleus is
measured by the F-value (Matouschek et al., 1989), which is the ratio of
destabilization of the transition state ensemble induced by point
mutation of the residue to destabilization of the native state induced by
this mutation; the point mutations are typically changes from a larger
residue to a smaller one. In our model, we calculated F-values (for each
residue) as the number of contacts which are formed by the side chain
of the residue in the transition state (averaged over the whole transition
state ensemble) divided by the number of contacts of this residue in the
Different packing of external residues
Table 2. Average number of atom–atom contacts per residue for all, interior and exterior amino acid residues of thermophilic and mesophilic
proteins
Type of residue
Fraction (%)
Number of contacts per residue (Contact distance cutoff 6 Å)
Thermophiles
Mesophiles
P
80.7 0.4
109.2 0.9
59.5 0.2
1.1 109
0.47
2.0 1011
All
Interior
Exterior
100
13
46
82.1 0.4
108.6 1.0
61.0 0.2
All
Interior
Exterior
100
13
46
Number of contacts per residue (Contact distance cutoff 8 Å)
219.0 1.1
215.1 1.1
289.5 2.8
290.6 2.5
163.6 0.6
159.4 0.6
native state (in other words, we made mutations to glycine)
(Galzitskaya et al., 2005):
r ðnI # Þ I #
¼
ð8Þ
r ðn0 Þ
In the cases when a residue was a glycine in the wild-type protein, we
took the probability of this residue to be folded in the transition state
ensemble instead of its F-value.
3
RESULTS
In our work, we have studied proteins from thermophilic and
mesophilic organisms. Here, we refer to proteins from
thermophilic organisms as thermophilic proteins and proteins
from mesophilic organisms as mesophilic proteins.
To carry out a comparative study of the properties of
thermophilic and mesophilic proteins, we constructed a
database which includes 373 thermophile–mesophile pairs of
structurally well-aligned proteins (one of the pairs is shown in
Fig. 1 structurally aligned).
The origin of the selected thermophilic proteins was the
following: 269 of 373 proteins (72.1%) from bacteria, 2 (0.5%)
from eukaryotes and 102 (27.3%) from archaea. The origin of
the mesophilic counterparts was as follows: 283 (75.9%) from
bacteria, 83 (22.3%) from eukaryotes and 7 (1.9%) from
archaea (this database is available at http://phys.protres.ru/
resources/termo_meso_base.html).
One of the advantages of our database is that it consists of
homologous pairs of proteins. It should be mentioned that the
fraction of amino acid residues involved in the regular
secondary structure was identical in thermophilic and mesophilic proteins (80%).
Literature data indicates that thermophilic proteins should
be more compact than mesophilic proteins (Robinson-Rechavi
et al., 2006). One of our explanations is a greater number of
contacts between amino acid residues in thermophilic proteins
than in mesophilic proteins. In order to test this hypothesis, the
number of atom–atom contacts per residue for the 373
thermophile–mesophile pairs of proteins was calculated with
a contact distance cutoff of 6 and 8 Å (see Materials and
Methods section). Indeed, a greater number of atom–atom
contacts per residue was observed in thermophilic proteins than
in mesophilic proteins (the difference: 1.4 0.4, P ¼ 1.1 109
4.3 1012
0.66
7.0 1015
for 6 Å contact distance cutoff and 3.9 1.1, P ¼ 4.3 1012
for 8 Å contact distance cutoff, see Table 2). To determine
which amino acid residues contribute to the increased number
of atom–atom contacts, amino acid residues in each protein
were divided into groups according to whether they were
internal, external or intermediate (see Methods section). It
should be mentioned that the thermophilic and mesophilic
proteins from our database do not differ from each other by the
fraction of interior, exterior and intermediate amino acid
residues: 13% of interior, 46% of exterior and 41% of
intermediate ones. This is an advantage for our analysis. We
calculated the number of atom–atom contacts per residue for
interior and exterior amino acid residues from thermophilic and
mesophilic proteins (Table 2).
From the obtained data, it is evident that interior amino acid
residues of thermophilic and mesophilic proteins do not differ
in the number of atom–atom contacts per residue. This
means that the interior parts of thermophilic and mesophilic
proteins are similarly packed. On the other hand, exterior
amino acid residues of thermophilic proteins have a
greater number of atom–atom contacts per residue than
exterior amino acid residues of mesophilic proteins (the
difference is 1.5 0.2, P ¼ 2.0 1011 for 6 Å and 4.2 0.6,
P ¼ 7.0 1015 for 8 Å).
The average number of hydrogen bonds per residue within
the main chain of the protein was calculated for interior
residues only, for exterior residues only and for all residues. In
all three cases (see Table 3), a greater number of hydrogen
bonds per residue was observed in thermophilic proteins than in
mesophilic (the difference: 0.03 0.01, P ¼ 0.02).
Further, the amino acid composition of the interior and
exterior residues from thermophilic and mesophilic proteins
was analyzed. From Figure 2a, it is evident that the interior
residues of thermophilic and mesophilic proteins do not differ
in amino acid composition. External residues of thermophilic
proteins are observed to contain a higher content of amino
acids such as Lys (a difference of 0.019 0.003), Arg
(a difference of 0.017 0.003) and Glu (a difference of
0.039 0.003) than is observed in their mesophilic homologues.
In contrast, a higher content of Ala (a difference of
0.020 0.002), Asp (a difference of 0.009 0.002), Asn (a
difference of 0.008 0.002), Gln (a difference of 0.022 0.002),
Thr (a difference of 0.011 0.002), Ser (a difference of
2235
A.V.Glyakina et al.
Table 3. Average number of hydrogen bonds per residue for all,
interior and exterior amino acid residues of thermophilic and
mesophilic proteins
Table 4. Average number of atom–atom contacts per residue for all,
interior and exterior amino acid residues of thermophilic and
mesophilic proteins at a contact distance of 4 Å
Type of residue
Thermophiles
Mesophiles
P
Type of residue
Thermophiles
Mesophiles
P
All
Interior
Exterior
0.12 0.01
0.06 0.01
0.052 0.006
0.09 0.01
0.04 0.01
0.036 0.005
0.02
0.02
0.01
All
Interior
Exterior
9.10 0.06
11.78 0.12
6.93 0.05
8.97 0.06
11.92 0.11
6.75 0.05
1.6 103
0.26
9.0 106
0.25
thermophiles
mesophiles
0.20
0.15
0.10
0.05
0.00
ala
cys
asp
glu
phe
gly
his
ile
lys
leu
met
asn
pro
gln
arg
ser
thr
val
trp
tyr
Fractions of amino acid residues
(a)
Type of amino acid residue
(b)
thermophiles
mesophiles
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
ala
cys
asp
glu
phe
gly
his
ile
lys
leu
met
asn
pro
gln
arg
ser
thr
val
trp
tyr
Fraction of amino acid residues
0.18
Type of amino acid residue
Fig. 2. Fraction of amino acid residues of each of 20 types in
thermophilic and mesophilic proteins: (a)—interior amino acid residues;
(b)—exterior amino acid residues. In the Figure, the error of average is
given.
0.012 0.002) and His (a difference 0.007 0.001) is observed
in the exterior residues of mesophiles than in their thermophilic
homologues (see Fig. 2b).
Since the number of Lys, Arg and Glu residues is greater in
thermophilic than mesophilic proteins, the number of salt
bridges was calculated. It turned out that thermophilic proteins
contain on average 10.94 0.41 salt bridges per protein, while
mesophilic proteins contain only 9.20 0.37 salt bridges per
protein. In addition, the number of atom–atom contacts per
residue was calculated at a contact distance of 4 Å (the distance
cutoff for salt bridge formation) for external, internal and all
amino acid residues (Table 4). One can see that thermophilic
and mesophilic proteins differ in the number of atom–atom
2236
contacts per residue at 4 Å. Additionally, salt bridges evidently
contribute to the stability of thermophilic proteins.
In the article (Pack and Yoo, 2005), authors explored 20
pairs of thermophilic and mesophilic proteins and said that
exposed residues do not have a distinct difference on residue
packing. We have calculated the number of contacts for their
database using our approach and the results are similar to those
which we obtained for our database: thermophilic proteins
show a greater degree of packing of exposed residues than
mesophilic proteins (see Supplementary Material).
The t-test presented in the paper (Pack and Yoo, 2005) is not
reliable. So the authors consider ti41.282 and ti51.282 then
this is a two-tailed distribution but not a one-tailed t-test, and
the authors really consider t0.2 ¼ 1.29155 but not t0.1. But such a
consideration is not reliable. It means that each fifth difference
will have such t-value (41.282) randomly (namely this situation
we can see in the article in Table 2). If the authors work on the
critical level t0.05 (this is a usually used critical level in the
papers but not t0.2) then the obtained results are not reliable for
all data presented in the article.
One of the ways to analyze whether proteins with similar
topologies fold in a similar way is to compare theoretically
predicted free-energy barriers and F-values for all residues in
the native structures. Therefore, we analyzed the folding and
unfolding processes of proteins from our database and
compared the predicted properties of these processes in
thermophilic and mesophilic proteins. We simulated folding
and unfolding behavior at the melting temperature of a protein
using a method (Galzitskaya and Finkelstein, 1999;
Garbuzynskiy et al., 2004) that represents protein folding and
unfolding processes as a network of folding/unfolding pathways on a free-energy landscape (see Materials and Methods
section). This method was previously (Galzitskaya and
Finkelstein, 1999; Garbuzynskiy et al., 2004, 2005) tested on
a set of proteins in which both folding rates and folding nuclei
(the structured parts of protein molecules in the transition
state) were already investigated experimentally, and the method
was shown to be able to predict both folding nuclei and folding
rates (Galzitskaya and Finkelstein, 1999; Galzitskaya et al.,
2005; Garbuzynskiy et al., 2004, 2005).
For our calculations, we selected (from the database
described above) those pairs in which both proteins had no
breaks in their 3D structures. To finish calculations within a
reasonable time, we took only those proteins pairs in which
both proteins were smaller than 150 amino acid residues. After
applying the two additional criteria, we had 154 pairs of
proteins.
Different packing of external residues
1400
1200
1000
800
(0.9;1.0)
(0.8;0.9)
In Table 5, the obtained data averaged over all proteins are
demonstrated. One can see that the height of the free-energy
barrier is virtually the same in thermophilic and in mesophilic
proteins. This means that the proteins do not differ in freeenergy barrier at their mid-transition point. The size of the
folding nucleus is also the same on average (59 2 amino acid
residues). Our results confirm the experimental observations
(Olofsson et al., 2007; Schuler et al., 2002) that those
interactions which are different in thermophilic proteins
compared to their mesophilic counterparts form only after
the passing of the transition state. Behavior of the proteins is
similar at the melting temperatures of these proteins (although
thermophiles and mesophiles are characterized by different
melting temperatures). These results are in agreement with the
experimental data (Kumar et al., 2001) on the absence of a
correlation between the living temperature of organisms and
the height of the free-energy barrier on the folding pathway of
their proteins at the living temperature of the source organism
(Kumar et al., 2001).
Average F-values for thermophilic and mesophilic proteins
are also virtually the same (see Table 5).
However, there is a difference in F-value distributions
(see Fig. 3): thermophilic proteins on average have a slightly
narrower distribution than mesophilic proteins have.
In other words, thermophilic proteins have fewer F-values
below 0.2 (for which 520% of contacts are formed; residues
that are virtually not involved in the folding nucleus) and
greater than 0.9 (for which more than 90% of contacts are
formed; residues that are inside the folding nucleus);
instead, thermophilic proteins have a larger fraction of
F-values in the range 0.2–0.4 (20–40% of contacts
formed; these residues are partially involved in the folding
nucleus). This difference is not large but is reliable (w2-test
gives a probability below 1018 that these distributions are
the same).
A database that includes 373 homologous pairs of proteins
from thermophilic and mesophilic organisms was created.
Using this database, we found that proteins from thermophilic
organisms contain more atom–atom contacts per residue than
their mesophilic homologues do. Exterior residues that are
accessible to the solvent make the main contribution to this
difference. The amino acid composition of interior (inaccessible
to the solvent) and exterior amino acid residues of proteins
from thermophilic and mesophilic organisms were analyzed.
We determined that the exterior residues of proteins from
thermophilic organisms contain more Lys, Arg and Glu and
(0.7;0.8)
400
(0.6;0.7)
600
(0.5;0.6)
0.24
0.11
0.47
0.33
(0.4;0.5)
19.6 0.4
16.2 0.4
59.5 1.7
0.494 0.002
1600
(0.3;0.4)
19.9 0.4
16.6 0.4
58.9 1.6
0.491 0.002
1800
(0.2;0.3)
P
2000
(0.1;0.2)
Mesophiles
thermophilic proteins
mesophilic proteins
2200
(0.0;0.1)
hFmin i
hFeff i
hLeff i
hFi
Thermophiles
2400
Number of amino acid residues
Table 5. Parameters of the folding/unfolding processes for thermophilic
and mesophilic proteins averaged over the database of 154 pairs of
proteins
Φ-value
Fig. 3. Distribution of F-values of amino acid residues in thermophilic
(solid curve) and mesophilic (dash curve) proteins. Errors are calculated
as n1/2 where n is the number of residues.
less Ala, Asp, Asn, Gln, Ser and His than do proteins
from mesophilic organisms. The amino acid compositions
of interior residues of the proteins considered are not different.
Modeling of the folding/unfolding behavior of the studied
proteins at the melting temperature of each protein has
demonstrated that there is virtually no difference between the
thermophilic and mesophilic free-energy barrier heights and
the average size of protein folding nuclei under these conditions. Thus, the proteins of thermophilic and mesophilic
organisms possess similar folding properties at the midtransition point.
ACKNOWLEDGEMENTS
We are grateful to D. Reifsnyder for assistance in
preparation of the article. This work was supported by
the programs ‘Molecular and cellular biology’ and
‘Fundamental sciences – medicine’, by the Russian
Foundation of Basic Research (05-04-48750-a), by the INTAS
grant („ 05-1000004-7747) and by the Howard Hughes
Medical Institute (55005607).
Conflict of Interest: none declared.
REFERENCES
Aho,A. et al. (1976) The Design and Analysis of Computer Algorithms. AddisonWesley, Reading, MA.
Altschul,S.F. et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215,
403–410.
Barlow,D.J. and Thornton,J.M. (1983) Ion-pairs in proteins. J. Mol. Biol., 168,
867–885.
Berezovsky,I.N. and Shakhnovich,E.I. (2005) Physics and evolution
of thermophilic adaptation. Proc. Natl Acad. Sci. USA, 102,
12742–12747.
Berezovsky,I.N. et al. (2005) Entropic stabilization of proteins and its proteomic
consequences. PLoS Comput. Biol., 1, e47.
Berman,H.M. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28,
235–242.
Chakravarty,S. and Varadarajan,R. (2002) Elucidation of factors responsible for
enhanced thermal stability of proteins: structural genomics based study.
Biochemistry, 41, 8152–8161.
2237
A.V.Glyakina et al.
Dill,K.A. (1990) Dominant forces in protein folding. Biochemistry, 29,
7133–7155.
Finkelstein,A.V. and Badretdinov,A.Y. (1997) Rate of protein folding near the
point of thermodynamic equilibrium between the coil and the most stable
chain fold. Fold. Des., 2, 115–121.
Finkelstein,A.V. and Roytberg,M.A. (1993) Computation of biopolymers:
a general approach to different problems. Biosystems, 30, 1–19.
Flory,P.J. (1969) Statistical Mechanics of Chain Molecules. Interscience,
New York.
Fukuchi,S. and Nishikawa,K. (2001) Protein surface amino acid compositions
distinctively differ between thermophilic and mesophilic bacteria.
J. Mol. Biol., 309, 835–843.
Gallivan,J.P. and Dougherty,D.A. (1999) Cation-pi interactions in structural
biology. Proc. Natl Acad. Sci. USA, 96, 9459–9464.
Galzitskaya,O.V. and Finkelstein,A.V. (1999) A theoretical search for folding/
unfolding nuclei in three-dimensional protein structures. Proc. Natl Acad. Sci.
USA, 96, 11299–11304.
Galzitskaya,O.V. et al. (2005) Theoretical study of protein folding: outlining
folding nuclei and estimation of protein folding rates. J. Phys. Condens.
Matter, 17, S1539–S1551.
Garbuzynskiy,S.O. et al. (2004) Outlining folding nuclei in globular proteins.
J. Mol. Biol., 336, 509–525.
Garbuzynskiy,S.O. et al. (2005) On the prediction of folding nuclei in globular
proteins. Mol. Biol., 39, 1032–1041.
Honig,B. (1999) Protein folding: from the levinthal paradox to structure
prediction. J. Mol. Biol., 293, 283–293.
Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure:
pattern recognition of hydrogen-bonded and geometrical features.
Biopolymers, 22, 2577–2637.
Kumar,S. et al. (2000) Factors enhancing protein thermostability. Protein Eng.,
13, 179–191.
Kumar,S. et al. (2001) Thermodynamic differences among homologous thermophilic and mesophilic proteins. Biochemistry, 40, 14152–14165.
2238
Liang,H.-K. et al. (2005) Amino acid coupling patterns in thermophilic proteins.
Proteins, 59, 58–63.
Matouschek,J.T. et al. (1989) Mapping the transition state and pathway of
protein folding by protein engineering. Nature, 340, 122–126.
Nojima,H. et al. (1977) Reversible thermal unfolding of thermostable phosphoglycerate kinase. Thermostability associated with mean zero enthalpy change.
J. Mol. Biol., 116, 429–442.
Olofsson,M. et al. (2007) Folding of S6 structures with divergent amino acid
composition: pathway flexibility within partly overlapping foldons. J. Mol.
Biol., 365, 237–248.
Pack,S.P. and Yoo,Y.J. (2005) Packing-based difference of structural features
between thermophilic and mesophilic proteins. Int. J. Biol. Macromol., 35,
169–174.
Perl,D. et al. (1998) Conservation of rapid two-state folding in mesophilic,
thermophilic and hyperthermophilic cold shock proteins. Nat. Struct. biol., 5,
229–235.
Privalov,P.L. (1979) Stability of proteins: small globular proteins. Adv. Protein
Chem., 33, 167–241.
Razvi,A. and Scholtz,J.M. (2006) Lessons in stability from thermophilic proteins.
Protein Sci., 15, 1569–1578.
Robinson-Rechavi,M. et al. (2006) Contribution of electrostatic interactions,
compactness and quaternary structure to protein thermostability: lessons from
structural genomics of Thermotoga maritima. J. Mol. Biol., 356, 547–557.
Schuler,B. et al. (2002) Role of entropy in protein thermostability: folding kinetics
of a hyperthermophilic cold shock protein at high temperatures using
19
F NMR. Biochemistry, 41, 11670–11680.
Siew,N. et al. (2000) MaxSub: an automated measure for the assessment of
protein structure prediction quality. Bioinformatics, 16, 776–785.
Szilagyi,A. and Zavodszky,P. (2000) Structural differences between mesophilic
moderately thermophilic and extremely thermophilic protein subunits: results
of a comprehensive survey. Structure, 8, 493–504.
Zeldovich,K.B. et al. (2007) Protein and DNA sequence determinants of
thermophilic adaptation. PLoS Comput. Biol., 3, e5.