* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Different packing of external residues can explain differences in the
Evolution of metal ions in biological systems wikipedia , lookup
Amino acid synthesis wikipedia , lookup
Biochemical cascade wikipedia , lookup
Biosynthesis wikipedia , lookup
Gene expression wikipedia , lookup
Point mutation wikipedia , lookup
Paracrine signalling wikipedia , lookup
Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup
Signal transduction wikipedia , lookup
Expression vector wikipedia , lookup
Magnesium transporter wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Genetic code wikipedia , lookup
Homology modeling wikipedia , lookup
Bimolecular fluorescence complementation wikipedia , lookup
Protein purification wikipedia , lookup
Interactome wikipedia , lookup
Biochemistry wikipedia , lookup
Metalloprotein wikipedia , lookup
Western blot wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
Two-hybrid screening wikipedia , lookup
BIOINFORMATICS ORIGINAL PAPER Vol. 23 no. 17 2007, pages 2231–2238 doi:10.1093/bioinformatics/btm345 Structural bioinformatics Different packing of external residues can explain differences in the thermostability of proteins from thermophilic and mesophilic organisms Anna V. Glyakina1, Sergiy O. Garbuzynskiy2, Michail Yu. Lobanov2 and Oxana V. Galzitskaya2,* 1 Institute of Mathematical Problems of Biology and 2Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russia Received on April 24, 2007; revised and accepted on June 25, 2007 Advance Access publication June 28, 2007 Associate Editor: Burkhard Rost ABSTRACT Motivation: Understanding the basis of protein stability in thermophilic organisms raises a general question: what structural properties of proteins are responsible for the higher thermostability of proteins from thermophilic organisms compared to proteins from mesophilic organisms? Results: A unique database of 373 structurally well-aligned protein pairs from thermophilic and mesophilic organisms is constructed. Comparison of proteins from thermophilic and mesophilic organisms has shown that the external, water-accessible residues of the first group are more closely packed than those of the second. Packing of interior parts of proteins (residues inaccessible to water molecules) is the same in both cases. The analysis of amino acid composition of external residues of proteins from thermophilic organisms revealed an increased fraction of such amino acids as Lys, Arg and Glu, and a decreased fraction of Ala, Asp, Asn, Gln, Thr, Ser and His. Our theoretical investigation of folding/unfolding behavior confirms the experimental observations that the interactions that differ in thermophilic and mesophilic proteins form only after the passing of the transition state during folding. Thus, different packing of external residues can explain differences in thermostability of proteins from thermophilic and mesophilic organisms. Availability: The database of 373 structurally well-aligned protein pairs is available at http://phys.protres.ru/resources/termo_ meso_base.html Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online. 1 INTRODUCTION The importance of the various factors that contribute to a protein’s thermostability is the subject of intense study. Hydrogen bonding and the hydrophobic effect are the major stabilizing forces, but their relative contributions are still debated (Dill, 1990; Honig, 1999). In relation to this *To whom correspondence should be addressed. uncertainty, a question of great interest is how proteins from thermophilic organisms retain their structure and function at high temperature. The study of protein stability can help explore the sequencestructure stability relationship (Kumar et al., 2001; Olofsson et al., 2007; Perl et al., 1998; Razvi and Scholtz, 2006; Schuler et al., 2002). It has been shown that no correlations exist between a protein’s melting temperature and parameters, such as change in heat capacity, change in accessible surface area upon folding and number of residues in the protein (Kumar et al., 2001). Experimental data to show this was obtained from studies that included nine proteins from thermophiles and 10 proteins from mesophiles, all of which show reversible two-state folding/unfolding kinetics. However, the change in heat capacity correlates both to the difference in surface area accessible to the solvent between unfolded and native states and to the number of residues in the protein (Kumar et al., 2001). The authors drew the conclusion that higher thermostability was achieved by specific interactions, particularly electrostatic ones, which is supported by an increased enthalpy change at the melting temperature (Kumar et al., 2001). In a recent review, thermodynamic data from 26 homologous proteins obtained from thermophiles and mesophiles are compared (Razvi and Scholtz, 2006). The authors made an attempt to classify the proteins based on three different (Nojima et al., 1977) approaches to modulation of the protein stability curve in order to achieve higher thermostability. These three approaches are: (1) increasing the value of H without compensating for changes in S, which will result in a similar stability curve, but with higher G values at all temperatures; (2) reducing Cp, which results in a broadened stability curve Cp and (3) lowering the change in entropy for the folding transition, which increases the temperature of maximum stability. All three ways of achieving higher thermostability have been observed in nature, sometimes independently and in other cases in combination. Moreover, most proteins use different combinations of these three general approaches, and so a simple classification of proteins into three separate groups was not possible (Razvi and Scholtz, 2006). ß The Author 2007. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 2231 A.V.Glyakina et al. It has been shown that the folding rates of cold shock proteins from Thermotoga maritima (thermophilic organism) and Bacillus subtilis (mesophilic organism) are similar, while the unfolding rate of the thermophilic protein is two orders lower than that of its mesophilic homologue (Perl et al., 1998; Schuler et al., 2002). Thus, distinctions in stability arise from distinctions in the unfolding rate constants of thermophilic and mesophilic proteins. For thermophilic proteins, the activation barrier of unfolding increases (Schuler et al., 2002). Stabilization of the native state by the mutation of amino acid residues situated on the protein surface is one possible explanation for this phenomenon. These mutations lead to a large number of enthalpic interactions that form only after overcoming the free-energy barrier in the course of folding (Schuler et al., 2002). Recently, a similar result has been demonstrated for the folding of S6 ribosomal proteins from the thermophilic bacterium Thermus thermophilus and from the hyperthermophilic bacterium Aquifex aeolicus (Olofsson et al., 2007). The folding rate constants for these proteins are identical, but the unfolding rate for the protein from the hyperthermophilic organism is by an order of magnitude slower than that for the protein from the thermophilic organism (Olofsson et al., 2007). Along with experimental works, theoretical studies of sequences and 3D structures of proteins provide another way to find correlations between structural characteristics and stabilizing forces (Berezovsky and Shakhnovich, 2005; Berezovsky et al., 2005; Fukuchi and Nishikawa, 2001; Liang et al., 2005; Robinson-Rechavi et al., 2006; Zeldovich et al., 2007). Two physical mechanisms for the increase of protein thermostability are suggested (Berezovsky and Shakhnovich, 2005). One of these mechanisms relates to structural factors (homologous thermophilic and mesophilic proteins have different structures, and the compactness of thermophilic protein is greater), and the other is concerned with essential modifications of amino acid sequences of proteins (homologous thermophilic and mesophilic proteins have similar structures, but different sequences). It was shown that the mechanism chosen to increase stability depends on the evolutionary history of an organism (Berezovsky and Shakhnovich, 2005). The proteins from those organisms that originated in an extremely hot environment are more compact, and their stability increase is provided by structural factors. In contrast, those organisms that evolved as mesophiles but later recolonized in a hot environment have proteins that were thermostabilized by the modification of amino acid sequences (Berezovsky and Shakhnovich, 2005). Since amino acid composition plays an important role in thermostabilization, the sequences of thermophilic and mesophilic proteins have been studied (Berezovsky et al., 2005; Fukuchi and Nishikawa, 2001; Zeldovich et al., 2007). In the first case, the amino acid composition of the surface, interior and entire amino acid chain of 279 proteins from thermophilic and mesophilic bacteria with known spatial structures were analyzed, which demonstrated that polar residues are scarce and charged residues are abundant in thermophilic proteins (Fukuchi and Nishikawa, 2001). This difference is most apparent from the surface composition rather than that of the 2232 interior (Fukuchi and Nishikawa, 2001). Different sets of scarce and abundant amino acid types have been obtained for different databases (Kumar et al., 2000; Szilagyi and Zavodszky, 2000). A set of amino acid residues whose total fractions in the proteomes are correlated with optimal growth temperatures has been recently found (Ile, Val, Tyr, Trp, Arg, Glu and Leu) (Zeldovich et al., 2007). An interesting result was obtained upon a theoretical investigation of the mesophilic and thermophilic hydrolase H proteins from Escherichia coli (mesophilic organism) and T.thermophilus (thermophilic) (Berezovsky et al., 2005). Although Lys and Arg are chemically similar amino acid residues, it was shown that Lys has a much greater number of accessible rotamers in the folded state than Arg has. To demonstrate the stabilizing role of lysine residues, the authors replaced Arg with Lys (in silico) and analyzed the unfolding simulations. These simulations have shown that the modified structure was more stable. Consequently, Lys (in contrast to Arg) stabilizes the native state of the protein, preferentially entropically (Berezovsky et al., 2005). Analysis of a database consisting of 94 homologous pairs of protein structures from thermophilic and mesophilic organisms demonstrates that thermophilic proteins are on average shorter than mesophilic proteins, and that thermophilic proteins contain a larger number of salt bridges per residue (0.0401 versus 0.0322, p ¼ 104), a larger number of hydrogen bonds per residue (1.42 versus 1.40, p ¼ 102), a smaller fraction of amino acid residues that are not involved in regular secondary structure (0.370 versus 0.382, p ¼ 102), and a smaller ratio of the accessible surface area to the surface of a sphere of the same volume as the protein (2.38 versus 2.43, p ¼ 104) (RobinsonRechavi et al., 2006). (p is the probability that the two distributions have the same average values, as calculated by Student’s paired t-test. If this probability approaches zero, then these two average values are different, and if it approaches unity, they are identical.) Although it has been shown that -cationic interactions (i.e. interactions between an aromatic side chain (Phe, Tyr or Trp) and a cationic side chain (Lys or Arg) (Gallivan and Dougherty, 1999) play an important role in increasing the thermostability of proteins (Chakravarty and Varadarajan, 2002), the difference in this parameter between thermophiles and mesophiles is not reliable (0.00984 for thermophiles versus 0.00749 for mesophiles, p ¼ 102) (Robinson-Rechavi et al., 2006). There also is not a high correlation between the number of salt bridges and -cationic interaction (correlation coefficient: 0.21). The authors of the article (Robinson-Rechavi et al., 2006) also demonstrated that -cationic interactions play a smaller role (compared to salt bridges) in the stabilization of thermophilic proteins (Robinson-Rechavi et al., 2006). On the other hand, there is no correlation between changes in properties that describe compactness and in electrostatic interactions (Robinson-Rechavi et al., 2006). Differences in residue packing between inner and outer residues of 20 pairs of thermophilic and mesophilic proteins were explored by Pack and Yoo (2005). They found that exposed residues do not have a distinct difference on residue packing (in the Results section we will discuss this observation). Different packing of external residues Without a doubt, the 3D structure of any protein is determined by a delicate balance of different interactions. Hydrogen bonds, salt bridges, the hydrophobic effect play roles in the folding of a protein and the establishment of its final structure as well as binding multi-valent ions and a large number of prolines (Barlow and Thornton, 1983; Kumar et al., 2000; Szilagyi and Zavodszky, 2000). However, on the basis of the known facts, it is difficult to answer the question of which type of interaction is dominant in the maintenance of the native structure of proteins from thermophilic organisms. Despite the fact that there are many works devoted to the search for differences between thermophilic and mesophilic proteins, this question is not solved. Moreover, in some of the works the authors compare only one pair of proteins (one thermophilic protein and one mesophilic protein) or pairs of proteins without any definite criterion for homology or all proteins (both homologous or non-homologous) in the proteomes. In this work, we have created a unique database of proteins in order to look for differences between structurally well-aligned proteins from thermophilic and mesophilic organisms. 2 2.1 METHODS Creation of the database To search for structural differences between thermophilic and mesophilic proteins, we constructed a database of pairs of homologous proteins (one protein in each pair was from a thermophilic organism while the other protein was from a mesophilic organism). This database was created in the following way. In September 2005, all available 3D structures of proteins were taken from Protein Data Bank (PDB) (Berman et al., 2000). For each protein chain, the organism which was the initial source of the protein (usually its name is in the ORGANISM_SCIENTIFIC record corresponding to this chain in the PDB file) was determined. From literature data, we determined which organisms are thermophilic. Pairwise local alignments for all thermophilic chains with each other were made with the help of the program BLAST (Basic Local Alignment Search Tool) (Altschul et al., 1990) in order to find homologous sequences. The homologous sequences of thermophiles (BLAST E-value5105) were joined into clusters. Then, the sequences of mesophiles which were homologous to at least one thermophilic sequence in a cluster were added to the corresponding clusters. Then, all proteins in a cluster were structurally aligned by the program AligProf (M.Y. Lobanov, unpublished data), and the MaxSub value (this value determines the quality of 3D alignment of structures) (Siew et al., 2000) was obtained for each pair of proteins. The MaxSub value was calculated as: P 1 MaxSub ¼ i 1þðdi =3:5Þ2 minðN1 ; N2 Þ 100% ð1Þ where N1 and N2 were numbers of residues in the proteins in the pair. The summation was made for all pairs of amino acids for which the distance between C-atoms di (after superposition) was 53.5 Å. If the distance between C-atoms was more than 3.5 Å, the residues were considered not to be aligned. If the aligned fragments were less than four residues, these fragments were also ignored. The MaxSub value changes within the limits of 0%5MaxSub5100%; if MaxSub ¼ 100%, it means that the examined structures completely coincide. The thermophile–mesophile pair with the largest MaxSub value was selected from each cluster. To these best aligned pairs, the following criteria were applied: (1) multidomain proteins were divided into domains; (2) the length of Fig. 1. Structural alignment (with the help of the program AligProf) of a pair of proteins from thermophilic and mesophilic organisms: 1 (gray curve)—protein GroES from a thermophilic organism T.thermophilus (PDB entry 1we3; chain O; residues 6 – 100); 2 (black curve)—protein GroES from a mesophilic organism E.coli (PDB entry 1pcq; chain O; residues 1 – 96). a domain was not more than 400 amino acid residues; (3) if the N- or C-end of one of the proteins in the pair were longer than in another one, they were cut; (4) the difference in the length between proteins in any pair was not more than 10% from the length of the shortest protein; (5) the number of residues without 3D coordinates did not exceed 10% of the total number of residues in the protein; (6) split domains (i.e. domains consisting of two or more separate regions of the chain) were excluded and (7) the MaxSub value for each thermophile– mesophile pair was more than 70%. As a result, 373 pairs of structurally well-aligned proteins (one of the pairs is presented in Fig. 1) were selected. 2.2 Calculation of structural parameters For each amino acid residue in the database, the type of secondary structure, number of hydrogen bonds, and surface area accessible to the solvent were obtained with the help of the program DSSP (Definition of Secondary Structure of Proteins) (Kabsch and Sander, 1983). A residue was called internal if the area of its surface accessible to the solvent was equal to zero, and external if its accessible surface area was more than 25% of the maximal accessible surface area observed in PDB for this type of residue (see Table 1). The other amino acid residues were called intermediate residues (Fukuchi and Nishikawa, 2001). The calculation of the number of atom–atom contacts per residue in proteins was carried out in the following way: two atoms were considered in contact with each other if their centers were at a distance of 56 Å (or 8 Å). The atom–atom contacts between adjacent residues as well as within one residue were not taken into account. Then, the total number of atom–atom contacts in a protein was divided by the number of residues in the protein. The frequency of the occurrence of various types of amino acid residues among interior and exterior residues was determined as the ratio of the number of the residues of each type occurring among exterior (interior), to the total number of the exterior (interior) residues. The energy of hydrogen bonds was calculated with the help of the program DSSP. According to Kabsch and Sander (1983), a hydrogen bond exists if its energy is lower than 0.5 kcal/mol. A hydrogen bond was considered exterior (interior), if its donor and acceptor both belong to the exterior (interior) residues. If a donor belongs to the exterior (interior) residue and an acceptor to the interior (exterior) residue, then the hydrogen bond was considered neither interior nor exterior. 2233 A.V.Glyakina et al. Table 1. Maximum accessible surface area (ASA) of 20 types of amino acid residues, as observed in PDB* Type of residue ASA, Å2 Type of residue ASA, Å2 Gly 104 Arg 278 Ala 159 Thr 186 Pro 183 Val 164 Glu 215 Ile 190 Gln 237 Leu 212 Asp 189 Met 204 Asn 188 Phe 237 Ser 176 Tyr 250 His 209 Cys 144 Lys 231 Trp 160 For each amino acid in proteins from PDB, accessible surface area was calculated with the help of the program DSSP and maximal accessible surface area for each of 20 types of amino acid residues was chosen. We did not consider amino acid residues situated on the termini of a protein and at the ends of breaks in 3D structures. We consider a salt bridge to exist if the distance between oppositely charged side groups does not exceed 4 Å (Barlow and Thornton, 1983; Kumar et al., 2000). We also calculated salt-bridge contacts for 3 Å and the results were the same (data not shown). Numerical values of all the structural features considered in this work were calculated for all proteins in the database and compared (thermophiles versus mesophiles) by Student’s pairwise t-test. 2.3 Analysis of protein folding For investigation of protein folding and unfolding behavior, we used the previously described (Galzitskaya and Finkelstein, 1999; Galzitskaya et al., 2005; Garbuzynskiy et al., 2004, 2005) algorithm of analysis of a network of protein folding/unfolding pathways. The method is based on investigation of pathways of stepwise reversible unfolding of a known 3D structure of the native state (taken from PDB) of the protein. Each step on each pathway is reversible and consists of a removal of one ‘chain link’ (a chain fragment consisting of a few amino acid residues) from the 3D structure. In this work, the size of the chain link was set to five residues (except for the C-terminal link which was smaller if the given protein size was non-divisible by five), which is the optimal value of this parameter (Garbuzynskiy et al., 2004). The removed links are assumed to form a coil: they lose all their native interactions but gain the entropy of a coil. In addition, if an unfolded loop with both ends fixed by the remaining structured part arises in the structure, some entropy is spent on fixation of these ends. Considering all possible pathways of protein reversible unfolding, a network of protein folding/unfolding pathways is obtained. For each of the obtained structures, the free energy is calculated as follows: ! X FI ¼ " nI T I þ ð2Þ Sloop loops 2 I where nI is the number of native contacts (these are atom–atom contacts at a distance 6 Å) in the folded part of structure I, I is the number of residues in the unfolded part of the structure, " is the energy of one contact (all contacts are assumed to be equal), is the difference in entropy between the coil and the native state of a residue [we take ¼ 2.3R (Privalov, 1979) where R is the gas constant], T is temperature P and loops 2 I Sloop is the entropy spent on fixation of the ends of all closed unfolded loops existing in structure I: 5 3 Sloop ¼ R lnjk lj R r2kl a2 =ð2 Aajk ljÞ 2 2 ð3Þ where rkl is the distance between the C atoms of residues k and l, a ¼ 3.8 Å is the distance between the neighbor C atoms in the chain, and A is the persistence length for a polypeptide (we take A ¼ 20 Å) (Flory, 1969). We model protein folding and unfolding processes in the midtransition point (i.e. at the melting temperature) of the given protein. At the point of mid-transition, only two states are observed (native state and totally unfolded state) while the other possible structures 2234 (partially folded and misfolded) are destabilized; thus, the folding process is in the simplest form (Finkelstein and Badretdinov, 1997). The free energies of the native state and the totally unfolded state are equal, i.e. the average contact energy " is " ¼ Tm N=n0 ð4Þ where Tm is the melting temperature of the protein, n0 is the number of contacts in the native structure and N is the total number of the protein chain residues. Thus, a network of protein folding/unfolding pathways on a freeenergy landscape is obtained. Further, on each of the pathways, we search (using a dynamic programming method) (Aho et al., 1976; Finkelstein and Roytberg, 1993) for a free-energy maximum (i.e. the transition state of this pathway; the part of the molecule which is structured in the transition state is a folding nucleus). Then, the free energies of all pathways are compared and the pathway with the minimum free energy of the transition state Fmin is revealed (this is the optimal pathway since it has the lowest freeenergy barrier). Considering all possible pathways, we thus obtain the ensemble of transition states (each of which possesses its own free-energy) and the ensemble of folding nuclei. The effective height of the free energy barrier was calculated as X expðFI# =RTÞ ð5Þ Feff ¼ RT ln I# F #I where is the free energy of a single transition state, and the summation is over all transition states. The average number of residues in the folding nucleus ensemble was also calculated according to the free energy of the transition state. The value of the Boltzmann probability of a transition state I # is X expðFI #=RTÞ ð6Þ PI # ¼ expðFI #=RTÞ= I# The effective size of the folding nucleus for a protein was calculated as follows: X LI # PI # ð7Þ Leff ¼ I# where L#I and PI # is the number of folded residues in a transition state I #, is the Boltzmann probability of I# in the transition state ensemble. The degree of involvement of a residue in the folding nucleus is measured by the F-value (Matouschek et al., 1989), which is the ratio of destabilization of the transition state ensemble induced by point mutation of the residue to destabilization of the native state induced by this mutation; the point mutations are typically changes from a larger residue to a smaller one. In our model, we calculated F-values (for each residue) as the number of contacts which are formed by the side chain of the residue in the transition state (averaged over the whole transition state ensemble) divided by the number of contacts of this residue in the Different packing of external residues Table 2. Average number of atom–atom contacts per residue for all, interior and exterior amino acid residues of thermophilic and mesophilic proteins Type of residue Fraction (%) Number of contacts per residue (Contact distance cutoff 6 Å) Thermophiles Mesophiles P 80.7 0.4 109.2 0.9 59.5 0.2 1.1 109 0.47 2.0 1011 All Interior Exterior 100 13 46 82.1 0.4 108.6 1.0 61.0 0.2 All Interior Exterior 100 13 46 Number of contacts per residue (Contact distance cutoff 8 Å) 219.0 1.1 215.1 1.1 289.5 2.8 290.6 2.5 163.6 0.6 159.4 0.6 native state (in other words, we made mutations to glycine) (Galzitskaya et al., 2005): r ðnI # Þ I # ¼ ð8Þ r ðn0 Þ In the cases when a residue was a glycine in the wild-type protein, we took the probability of this residue to be folded in the transition state ensemble instead of its F-value. 3 RESULTS In our work, we have studied proteins from thermophilic and mesophilic organisms. Here, we refer to proteins from thermophilic organisms as thermophilic proteins and proteins from mesophilic organisms as mesophilic proteins. To carry out a comparative study of the properties of thermophilic and mesophilic proteins, we constructed a database which includes 373 thermophile–mesophile pairs of structurally well-aligned proteins (one of the pairs is shown in Fig. 1 structurally aligned). The origin of the selected thermophilic proteins was the following: 269 of 373 proteins (72.1%) from bacteria, 2 (0.5%) from eukaryotes and 102 (27.3%) from archaea. The origin of the mesophilic counterparts was as follows: 283 (75.9%) from bacteria, 83 (22.3%) from eukaryotes and 7 (1.9%) from archaea (this database is available at http://phys.protres.ru/ resources/termo_meso_base.html). One of the advantages of our database is that it consists of homologous pairs of proteins. It should be mentioned that the fraction of amino acid residues involved in the regular secondary structure was identical in thermophilic and mesophilic proteins (80%). Literature data indicates that thermophilic proteins should be more compact than mesophilic proteins (Robinson-Rechavi et al., 2006). One of our explanations is a greater number of contacts between amino acid residues in thermophilic proteins than in mesophilic proteins. In order to test this hypothesis, the number of atom–atom contacts per residue for the 373 thermophile–mesophile pairs of proteins was calculated with a contact distance cutoff of 6 and 8 Å (see Materials and Methods section). Indeed, a greater number of atom–atom contacts per residue was observed in thermophilic proteins than in mesophilic proteins (the difference: 1.4 0.4, P ¼ 1.1 109 4.3 1012 0.66 7.0 1015 for 6 Å contact distance cutoff and 3.9 1.1, P ¼ 4.3 1012 for 8 Å contact distance cutoff, see Table 2). To determine which amino acid residues contribute to the increased number of atom–atom contacts, amino acid residues in each protein were divided into groups according to whether they were internal, external or intermediate (see Methods section). It should be mentioned that the thermophilic and mesophilic proteins from our database do not differ from each other by the fraction of interior, exterior and intermediate amino acid residues: 13% of interior, 46% of exterior and 41% of intermediate ones. This is an advantage for our analysis. We calculated the number of atom–atom contacts per residue for interior and exterior amino acid residues from thermophilic and mesophilic proteins (Table 2). From the obtained data, it is evident that interior amino acid residues of thermophilic and mesophilic proteins do not differ in the number of atom–atom contacts per residue. This means that the interior parts of thermophilic and mesophilic proteins are similarly packed. On the other hand, exterior amino acid residues of thermophilic proteins have a greater number of atom–atom contacts per residue than exterior amino acid residues of mesophilic proteins (the difference is 1.5 0.2, P ¼ 2.0 1011 for 6 Å and 4.2 0.6, P ¼ 7.0 1015 for 8 Å). The average number of hydrogen bonds per residue within the main chain of the protein was calculated for interior residues only, for exterior residues only and for all residues. In all three cases (see Table 3), a greater number of hydrogen bonds per residue was observed in thermophilic proteins than in mesophilic (the difference: 0.03 0.01, P ¼ 0.02). Further, the amino acid composition of the interior and exterior residues from thermophilic and mesophilic proteins was analyzed. From Figure 2a, it is evident that the interior residues of thermophilic and mesophilic proteins do not differ in amino acid composition. External residues of thermophilic proteins are observed to contain a higher content of amino acids such as Lys (a difference of 0.019 0.003), Arg (a difference of 0.017 0.003) and Glu (a difference of 0.039 0.003) than is observed in their mesophilic homologues. In contrast, a higher content of Ala (a difference of 0.020 0.002), Asp (a difference of 0.009 0.002), Asn (a difference of 0.008 0.002), Gln (a difference of 0.022 0.002), Thr (a difference of 0.011 0.002), Ser (a difference of 2235 A.V.Glyakina et al. Table 3. Average number of hydrogen bonds per residue for all, interior and exterior amino acid residues of thermophilic and mesophilic proteins Table 4. Average number of atom–atom contacts per residue for all, interior and exterior amino acid residues of thermophilic and mesophilic proteins at a contact distance of 4 Å Type of residue Thermophiles Mesophiles P Type of residue Thermophiles Mesophiles P All Interior Exterior 0.12 0.01 0.06 0.01 0.052 0.006 0.09 0.01 0.04 0.01 0.036 0.005 0.02 0.02 0.01 All Interior Exterior 9.10 0.06 11.78 0.12 6.93 0.05 8.97 0.06 11.92 0.11 6.75 0.05 1.6 103 0.26 9.0 106 0.25 thermophiles mesophiles 0.20 0.15 0.10 0.05 0.00 ala cys asp glu phe gly his ile lys leu met asn pro gln arg ser thr val trp tyr Fractions of amino acid residues (a) Type of amino acid residue (b) thermophiles mesophiles 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 ala cys asp glu phe gly his ile lys leu met asn pro gln arg ser thr val trp tyr Fraction of amino acid residues 0.18 Type of amino acid residue Fig. 2. Fraction of amino acid residues of each of 20 types in thermophilic and mesophilic proteins: (a)—interior amino acid residues; (b)—exterior amino acid residues. In the Figure, the error of average is given. 0.012 0.002) and His (a difference 0.007 0.001) is observed in the exterior residues of mesophiles than in their thermophilic homologues (see Fig. 2b). Since the number of Lys, Arg and Glu residues is greater in thermophilic than mesophilic proteins, the number of salt bridges was calculated. It turned out that thermophilic proteins contain on average 10.94 0.41 salt bridges per protein, while mesophilic proteins contain only 9.20 0.37 salt bridges per protein. In addition, the number of atom–atom contacts per residue was calculated at a contact distance of 4 Å (the distance cutoff for salt bridge formation) for external, internal and all amino acid residues (Table 4). One can see that thermophilic and mesophilic proteins differ in the number of atom–atom 2236 contacts per residue at 4 Å. Additionally, salt bridges evidently contribute to the stability of thermophilic proteins. In the article (Pack and Yoo, 2005), authors explored 20 pairs of thermophilic and mesophilic proteins and said that exposed residues do not have a distinct difference on residue packing. We have calculated the number of contacts for their database using our approach and the results are similar to those which we obtained for our database: thermophilic proteins show a greater degree of packing of exposed residues than mesophilic proteins (see Supplementary Material). The t-test presented in the paper (Pack and Yoo, 2005) is not reliable. So the authors consider ti41.282 and ti51.282 then this is a two-tailed distribution but not a one-tailed t-test, and the authors really consider t0.2 ¼ 1.29155 but not t0.1. But such a consideration is not reliable. It means that each fifth difference will have such t-value (41.282) randomly (namely this situation we can see in the article in Table 2). If the authors work on the critical level t0.05 (this is a usually used critical level in the papers but not t0.2) then the obtained results are not reliable for all data presented in the article. One of the ways to analyze whether proteins with similar topologies fold in a similar way is to compare theoretically predicted free-energy barriers and F-values for all residues in the native structures. Therefore, we analyzed the folding and unfolding processes of proteins from our database and compared the predicted properties of these processes in thermophilic and mesophilic proteins. We simulated folding and unfolding behavior at the melting temperature of a protein using a method (Galzitskaya and Finkelstein, 1999; Garbuzynskiy et al., 2004) that represents protein folding and unfolding processes as a network of folding/unfolding pathways on a free-energy landscape (see Materials and Methods section). This method was previously (Galzitskaya and Finkelstein, 1999; Garbuzynskiy et al., 2004, 2005) tested on a set of proteins in which both folding rates and folding nuclei (the structured parts of protein molecules in the transition state) were already investigated experimentally, and the method was shown to be able to predict both folding nuclei and folding rates (Galzitskaya and Finkelstein, 1999; Galzitskaya et al., 2005; Garbuzynskiy et al., 2004, 2005). For our calculations, we selected (from the database described above) those pairs in which both proteins had no breaks in their 3D structures. To finish calculations within a reasonable time, we took only those proteins pairs in which both proteins were smaller than 150 amino acid residues. After applying the two additional criteria, we had 154 pairs of proteins. Different packing of external residues 1400 1200 1000 800 (0.9;1.0) (0.8;0.9) In Table 5, the obtained data averaged over all proteins are demonstrated. One can see that the height of the free-energy barrier is virtually the same in thermophilic and in mesophilic proteins. This means that the proteins do not differ in freeenergy barrier at their mid-transition point. The size of the folding nucleus is also the same on average (59 2 amino acid residues). Our results confirm the experimental observations (Olofsson et al., 2007; Schuler et al., 2002) that those interactions which are different in thermophilic proteins compared to their mesophilic counterparts form only after the passing of the transition state. Behavior of the proteins is similar at the melting temperatures of these proteins (although thermophiles and mesophiles are characterized by different melting temperatures). These results are in agreement with the experimental data (Kumar et al., 2001) on the absence of a correlation between the living temperature of organisms and the height of the free-energy barrier on the folding pathway of their proteins at the living temperature of the source organism (Kumar et al., 2001). Average F-values for thermophilic and mesophilic proteins are also virtually the same (see Table 5). However, there is a difference in F-value distributions (see Fig. 3): thermophilic proteins on average have a slightly narrower distribution than mesophilic proteins have. In other words, thermophilic proteins have fewer F-values below 0.2 (for which 520% of contacts are formed; residues that are virtually not involved in the folding nucleus) and greater than 0.9 (for which more than 90% of contacts are formed; residues that are inside the folding nucleus); instead, thermophilic proteins have a larger fraction of F-values in the range 0.2–0.4 (20–40% of contacts formed; these residues are partially involved in the folding nucleus). This difference is not large but is reliable (w2-test gives a probability below 1018 that these distributions are the same). A database that includes 373 homologous pairs of proteins from thermophilic and mesophilic organisms was created. Using this database, we found that proteins from thermophilic organisms contain more atom–atom contacts per residue than their mesophilic homologues do. Exterior residues that are accessible to the solvent make the main contribution to this difference. The amino acid composition of interior (inaccessible to the solvent) and exterior amino acid residues of proteins from thermophilic and mesophilic organisms were analyzed. We determined that the exterior residues of proteins from thermophilic organisms contain more Lys, Arg and Glu and (0.7;0.8) 400 (0.6;0.7) 600 (0.5;0.6) 0.24 0.11 0.47 0.33 (0.4;0.5) 19.6 0.4 16.2 0.4 59.5 1.7 0.494 0.002 1600 (0.3;0.4) 19.9 0.4 16.6 0.4 58.9 1.6 0.491 0.002 1800 (0.2;0.3) P 2000 (0.1;0.2) Mesophiles thermophilic proteins mesophilic proteins 2200 (0.0;0.1) hFmin i hFeff i hLeff i hFi Thermophiles 2400 Number of amino acid residues Table 5. Parameters of the folding/unfolding processes for thermophilic and mesophilic proteins averaged over the database of 154 pairs of proteins Φ-value Fig. 3. Distribution of F-values of amino acid residues in thermophilic (solid curve) and mesophilic (dash curve) proteins. Errors are calculated as n1/2 where n is the number of residues. less Ala, Asp, Asn, Gln, Ser and His than do proteins from mesophilic organisms. The amino acid compositions of interior residues of the proteins considered are not different. Modeling of the folding/unfolding behavior of the studied proteins at the melting temperature of each protein has demonstrated that there is virtually no difference between the thermophilic and mesophilic free-energy barrier heights and the average size of protein folding nuclei under these conditions. Thus, the proteins of thermophilic and mesophilic organisms possess similar folding properties at the midtransition point. ACKNOWLEDGEMENTS We are grateful to D. Reifsnyder for assistance in preparation of the article. This work was supported by the programs ‘Molecular and cellular biology’ and ‘Fundamental sciences – medicine’, by the Russian Foundation of Basic Research (05-04-48750-a), by the INTAS grant („ 05-1000004-7747) and by the Howard Hughes Medical Institute (55005607). Conflict of Interest: none declared. REFERENCES Aho,A. et al. (1976) The Design and Analysis of Computer Algorithms. AddisonWesley, Reading, MA. Altschul,S.F. et al. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. Barlow,D.J. and Thornton,J.M. (1983) Ion-pairs in proteins. J. Mol. Biol., 168, 867–885. Berezovsky,I.N. and Shakhnovich,E.I. (2005) Physics and evolution of thermophilic adaptation. Proc. Natl Acad. Sci. USA, 102, 12742–12747. Berezovsky,I.N. et al. (2005) Entropic stabilization of proteins and its proteomic consequences. PLoS Comput. Biol., 1, e47. Berman,H.M. et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. Chakravarty,S. and Varadarajan,R. (2002) Elucidation of factors responsible for enhanced thermal stability of proteins: structural genomics based study. Biochemistry, 41, 8152–8161. 2237 A.V.Glyakina et al. Dill,K.A. (1990) Dominant forces in protein folding. Biochemistry, 29, 7133–7155. Finkelstein,A.V. and Badretdinov,A.Y. (1997) Rate of protein folding near the point of thermodynamic equilibrium between the coil and the most stable chain fold. Fold. Des., 2, 115–121. Finkelstein,A.V. and Roytberg,M.A. (1993) Computation of biopolymers: a general approach to different problems. Biosystems, 30, 1–19. Flory,P.J. (1969) Statistical Mechanics of Chain Molecules. Interscience, New York. Fukuchi,S. and Nishikawa,K. (2001) Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria. J. Mol. Biol., 309, 835–843. Gallivan,J.P. and Dougherty,D.A. (1999) Cation-pi interactions in structural biology. Proc. Natl Acad. Sci. USA, 96, 9459–9464. Galzitskaya,O.V. and Finkelstein,A.V. (1999) A theoretical search for folding/ unfolding nuclei in three-dimensional protein structures. Proc. Natl Acad. Sci. USA, 96, 11299–11304. Galzitskaya,O.V. et al. (2005) Theoretical study of protein folding: outlining folding nuclei and estimation of protein folding rates. J. Phys. Condens. Matter, 17, S1539–S1551. Garbuzynskiy,S.O. et al. (2004) Outlining folding nuclei in globular proteins. J. Mol. Biol., 336, 509–525. Garbuzynskiy,S.O. et al. (2005) On the prediction of folding nuclei in globular proteins. Mol. Biol., 39, 1032–1041. Honig,B. (1999) Protein folding: from the levinthal paradox to structure prediction. J. Mol. Biol., 293, 283–293. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637. Kumar,S. et al. (2000) Factors enhancing protein thermostability. Protein Eng., 13, 179–191. Kumar,S. et al. (2001) Thermodynamic differences among homologous thermophilic and mesophilic proteins. Biochemistry, 40, 14152–14165. 2238 Liang,H.-K. et al. (2005) Amino acid coupling patterns in thermophilic proteins. Proteins, 59, 58–63. Matouschek,J.T. et al. (1989) Mapping the transition state and pathway of protein folding by protein engineering. Nature, 340, 122–126. Nojima,H. et al. (1977) Reversible thermal unfolding of thermostable phosphoglycerate kinase. Thermostability associated with mean zero enthalpy change. J. Mol. Biol., 116, 429–442. Olofsson,M. et al. (2007) Folding of S6 structures with divergent amino acid composition: pathway flexibility within partly overlapping foldons. J. Mol. Biol., 365, 237–248. Pack,S.P. and Yoo,Y.J. (2005) Packing-based difference of structural features between thermophilic and mesophilic proteins. Int. J. Biol. Macromol., 35, 169–174. Perl,D. et al. (1998) Conservation of rapid two-state folding in mesophilic, thermophilic and hyperthermophilic cold shock proteins. Nat. Struct. biol., 5, 229–235. Privalov,P.L. (1979) Stability of proteins: small globular proteins. Adv. Protein Chem., 33, 167–241. Razvi,A. and Scholtz,J.M. (2006) Lessons in stability from thermophilic proteins. Protein Sci., 15, 1569–1578. Robinson-Rechavi,M. et al. (2006) Contribution of electrostatic interactions, compactness and quaternary structure to protein thermostability: lessons from structural genomics of Thermotoga maritima. J. Mol. Biol., 356, 547–557. Schuler,B. et al. (2002) Role of entropy in protein thermostability: folding kinetics of a hyperthermophilic cold shock protein at high temperatures using 19 F NMR. Biochemistry, 41, 11670–11680. Siew,N. et al. (2000) MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics, 16, 776–785. Szilagyi,A. and Zavodszky,P. (2000) Structural differences between mesophilic moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. Structure, 8, 493–504. Zeldovich,K.B. et al. (2007) Protein and DNA sequence determinants of thermophilic adaptation. PLoS Comput. Biol., 3, e5.