* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Cell dynamics of folding in two
Survey
Document related concepts
Protein moonlighting wikipedia , lookup
Biochemical switches in the cell cycle wikipedia , lookup
Cell encapsulation wikipedia , lookup
Extracellular matrix wikipedia , lookup
Cellular differentiation wikipedia , lookup
Cell culture wikipedia , lookup
Endomembrane system wikipedia , lookup
Cell growth wikipedia , lookup
Organ-on-a-chip wikipedia , lookup
Signal transduction wikipedia , lookup
Cytokinesis wikipedia , lookup
Protein domain wikipedia , lookup
Folding@home wikipedia , lookup
Transcript
Research Paper 235 Cell dynamics of folding in two-dimensional model proteins Marek Cieplak1 and Jayanth R Banavar2 Background: Functionally useful proteins are sequences of amino acids that fold rapidly under appropriate conditions into their native states. It is believed that rapid folders are sequences for which the folding dynamics entail the exploration of restricted conformations — the phase space can be thought of as a folding funnel. While there are many experimentally accessible predictions pertaining to the existence of such funnels and a coherent picture of the kinetics of folding has begun to emerge, there have been relatively few simple studies in the controlled setting of well-characterized lattice models. Results: We design rapidly folding sequences by assigning the strongest couplings to the contacts present in a target native state in a two-dimensional model of heteropolymers. Such sequences have large folding transition temperatures and low glass transition temperatures. The dependence of median folding times on temperature is investigated. The pathways to folding and their dependence on the temperature are illustrated via a study of the cell dynamics — a mapping of the dynamics into motion within the space of the maximally compact cells. Conclusions: Folding funnels can be defined operationally in a coarse-grained sense by mapping the states of the system into maximally compact conformations and then by identifying significant connectivities between them. Introduction Globular proteins fold to their biologically active native state in a proper range of temperature and acidity of a water solution [1]. The native state is assumed to be the ground state of the system [1] and this identification will be adopted in this paper. The dynamics of folding is akin to motion in a rugged free energy landscape [2]. The relative rapidity of the process, of the order of milliseconds to seconds, rules out an exhaustive search of conformations, which would have to take longer than the age of the universe [3] even in the case of short proteins. The notion of folding funnels [4] has been invoked to describe the apparent emergence of a restricted set of conformations that are actually involved in folding. Based on simulations of a lattice model, Onuchic et al. [5] have identified the crucial landmarks in the folding funnel for a fast-folding protein — through the molten globule states and lowenergy bottlenecks. There are several features that have been proposed to characterize sequences that are good folders: maximal compatibility (‘minimum frustration’) [6,7], large thermodynamic stability [2,8,9], and certain properties of thermodynamic parameters such as the specific heat and the structural overlap function [10–12]. The purpose of this paper is to study what makes a good folder and how one may identify the funnel conformations in model two-dimensional lattice proteins. The thermodynamic stability of a protein can be characterized by a folding transition temperature Tf below which Addresses: 1Institute of Physics, Polish Academy of Sciences, 02-668 Warsaw, Poland. 2Department of Physics and Center for Materials Physics, 104 Davey Laboratory, The Pennsylvania State University, University Park, PA 16802, USA. Correspondence: Marek Cieplak E-mail: [email protected] Key words: folding funnel, lattice models, protein design, protein folding Received: 07 Apr 1997 Revisions requested: 09 May 1997 Revisions received: 19 May 1997 Accepted: 29 May 1997 Published: 08 Jun 1997 Electronic identifier: 1359-0278-002-00235 Folding & Design 08 Jun 1997, 2:235–245 © Current Biology Ltd ISSN 1359-0278 the probability of occupancy of the native state, P0, becomes substantial. This may be assessed by several measures that we discuss later. The most transparent definition is to take Tf as a temperature at which P0 is ½ [13]. Note that Tf depends on the energy spectrum of a sequence but it does not depend on the dynamics. A quantity that is related to the dynamics of folding is the temperature of the glass transition, Tg, below which the protein is trapped in local energy minima and there is typically no accessibility, within measurement timescales, of the native state from a randomly chosen conformation even when the temperature is less than Tf [2,13]. Thus it follows that proteins that are functionally useful have a large Tf and a small Tg and the temperature range relevant to folding is Tg < T < Tf. Good folders have a large Tf : Tg ratio whereas bad folders are characterized by this ratio being less than 1. We have proposed recently [14,15] a method to design champion folders: for a given set of interaction contact energies the prescription is to rank order these energies and then to assign the strongest attractive energies to contacts that are present in the target native conformation. This method is a generalization of the approach taken by Go and Abe [16] in which the native contacts are set to be uniformly attractive whereas the non-native contacts do not interact. The procedure of contact assignment through rank ordering defines a champion folder or a candidate for being the best member within the family of sequences 236 Folding & Design Vol 2 No 4 that have a desired ground state and fold well. Note that there are a large number of such fully rank-ordered sequences for a given set of interaction energies and a target maximally compact native state. We do not attempt to distinguish between these degenerate sequences in this paper. The consequences of rank ordering have been demonstrated within the framework of simple yet illuminating three-dimensional [14] and two-dimensional [15] lattice models. In this paper, we focus on the two-dimensional model of a protein consisting of a 16 monomer self-avoiding chain on a square lattice [17,18]. The total number of conformations is only 802 075, of which 69 are maximally compact (henceforth we denote the maximally compact conformation as a cell), i.e. fit on a 4 × 4 lattice, and is amenable to exact enumeration. This allows for an exact evaluation of equilibrium parameters, such as Tf or the radius of gyration. We compare folding characteristics for three sequences: the champion folder generated through the rank ordering, a good folder and a bad folder. Equilibrium properties of the latter two sequences were obtained in [18]. We start by presenting the equilibrium phase diagrams for the three sequences. We then proceed to studies of dynamics as defined through a Monte Carlo process. We determine the behavior of folding times as a function of temperature, T, and determine Tg. We then go on to characterize the dynamics of folding within a coarse-grained scheme [15] and compare it for the three sequences. Each conformation of the heteropolymer is dynamically mapped onto one of the 69 cells. This is done by a steepest-descent quenching followed by determining the cell with the largest energetic overlap through common contacts. Thus, the true dynamics is represented in a coarse-grained manner as motion within the space of the 69 cells. The motivation for the steepest-descent part of the mapping is supplied by the pioneering work of Stillinger and Weber [19,20] in the context of glasses; see also a related work on spin glasses [21]. Strikingly different patterns in the cell dynamic space are found as the temperature is varied, illustrating the pathways to folding. E= ∑B i, jD(i - j) (1) i< j where Bijs are the values of the contact interactions. We take: ~ Bij = Bij + B0 (2) The contact interactions are effective couplings which result from the hydrophobic interactions of the amino acids with solvent [24]. Their values ~ can be estimated from microscopic principles [25] and Bij can be regarded as being governed, approximately, by a Gaussian probability distribution [8,9,18]. B0 is an overall shift (many of our results will focus on B0 = –1). For a two-dimensional model consisting of 16 monomers, there are 49 possible contact interactions. Each sequence is defined by a particular choice of the values of these interactions. ~ As the first set we consider the 49 interaction energies Bij of Dinner et al. [18] (see their Table I, upper right) — these were obtained as independent random values with a Gaussian distribution of zero mean and unit dispersion (thus defining the unit of temperature). These interaction energies are also shown in Table 1 of this paper and are merely an approximation to more realistic interaction energies between the amino acids of a real protein. We denote this sequence DSKS (after Dinner, Šali, Karplus and Shakhnovich [18], who demonstrated that this sequence had a very stable ground state). The ground state of the heteropolymer chain for any B0 < 0 is the conformation in Figure 1. The heteropolymer sequences and their equilibrium properties In order to define the second sequence, R, we rank order ∼ the 49 B ij values in a decreasing order of attraction. The numbers shown in Figure 1 indicate the rank of the interaction energy (a smaller number denotes a higher attraction). The family of champion folders with the same native state is designed by assigning the nine most attractive interactions to the nine native contacts (see [14]; their Figure 2 shows, inadvertently, the number of native contacts and not the number of all contacts). One way (of the nine) of doing this is shown by sequence R with the native state, as shown in Figure 1, which differs from DSKS in only three contacts. The remaining 40 interactions are assigned randomly to the non-native contacts. The whole list of these interactions, together with their ranking, is given in Table 1. Since proteins comprise combinations of up to 20 amino acids, they may be conveniently and simply modelled as heteropolymers (for reviews, see [22,23]). Each monomer represents an amino acid. Two monomers, i and j, are in contact with a contact energy Bij if they are not successive in sequence and yet are next to each other (a negative Bij denotes attraction) [22,23]. This restriction on monomers i and j is schematically denoted by D(i–j) and the energy of the system is written as: The third sequence, DSKS′, also comes from Dinner et al. [18] (see their Table I, lower left) and is also shown in Table 1 of this paper. Its native conformation depends on B0. Figure 2 shows two of these: one, maximally compact, for large negative B0s and the other, with eight contacts, for –2.9111 < B0 < 0. For B0 = 0 the native state has seven contacts, and for larger B0 the number of contacts is still smaller. Results and discussion Research Paper Cell dynamics of folding Cieplak and Banavar 237 Table 1 ~ Values of the Bij couplings for the three sequences DSKS, R, and DSKS¢. i j DSKS Rank R Rank DSKS′ Rank 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 9 9 9 10 10 11 11 12 13 4 6 8 10 12 14 16 5 7 9 11 13 15 6 8 10 12 14 16 7 9 11 13 15 8 10 14 16 9 11 13 15 10 12 14 16 11 13 15 12 14 6 13 15 14 16 15 16 –0.4923 1.0474 –0.5496 1.1895 –2.1095 –0.7512 –0.0469 1.3830 –1.2977 –0.1408 0.0940 0.9985 –1.8114 –0.1705 1.1223 –0.3847 –0.2865 0.3293 –1.0435 –1.0519 0.5629 0.3252 –0.5998 1.7539 –0.2118 1.1253 0.0326 0.3624 –1.6053 0.4711 0.0228 0.3899 0.2471 –1.3267 –1.4820 –0.3501 –0.1171 –0.2125 –0.5015 0.0594 0.6888 –0.2472 0.5166 1.1651 –0.2958 –1.7203 –0.0592 0.3877 14 42 12 46 1 10 26 47 7 23 30 41 2 22 43 15 18 33 9 8 39 32 11 48 21 44 28 34 4 37 27 36 31 6 5 16 24 20 13 29 40 19 38 45 17 3 25 35 1.8313 –0.1705 –1.4820 0.0326 –2.1095 –1.7203 1.1651 0.3624 –1.2977 0.6888 –0.7512 1.7539 –1.8114 –0.2958 0.3252 1.1253 0.9985 1.1223 –1.0435 –1.0519 0.3899 –0.2118 –0.5015 –0.1171 –0.5998 0.5166 0.0228 –0.2125 –1.6053 0.3293 0.2471 –0.0469 –0.2472 1.1895 –0.1408 0.5629 –1.3267 1.3830 –0.0592 0.4711 –0.4923 –0.3847 1.0474 –0.2865 0.0940 –0.5496 0.3877 –0.3501 49 22 5 28 1 3 45 34 7 40 10 48 2 17 32 44 41 43 9 8 36 21 13 24 11 38 27 20 4 33 31 26 19 46 23 39 6 47 25 37 14 15 42 18 30 12 35 16 1.9063 –0.8975 0.0770 1.6883 –0.6494 0.8742 0.2781 –0.9914 –0.6782 1.1379 0.0821 0.5539 –0.6158 –0.9801 –0.0625 –0.6560 –0.3041 –0.7576 –0.2991 –0.3037 0.2299 –0.9717 –0.8744 –0.2275 –0.0651 1.5177 –0.9366 –1.5168 –1.0613 –0.8259 –1.7790 0.7045 –0.4063 0.8901 0.3860 –0.8012 1.3644 1.3071 0.3967 0.6439 0.8944 0.1926 –1.5915 –0.7950 0.2272 –1.1363 0.6108 –0.3228 49 10 28 48 18 40 33 6 16 43 29 36 19 7 27 17 22 15 24 23 32 8 11 25 26 47 9 3 5 12 1 39 20 41 34 13 45 44 35 38 42 30 2 14 31 4 37 21 ~ Values of the Bij couplings (see equations 1 and 2), for the three sequences DSKS, R, and DSKS′ studied in this paper. The rank of a coupling is indicated immediately to the right of its value. For B0 = 0, rank 1 corresponds to the strongest attraction and rank 49 to the strongest repulsion. The bead hidden inside the conformation shown in Figure 1 is counted as the first bead. In order to assess whether DSKS and DSKS′ are ~ typical samples, we have generated 200 independent Bij Gaussian interaction sets randomly and assigned them randomly to the contacts. We calculated the Tf values for each of them for B0 = –1. The values of Tf ranged between 0.005 and 0.789 with an average of 0.289 and dispersion 0.153. The sequence DSKS′ has a Tf of 0.195 and is thus typical. We shall demonstrate later on that this sequence is a bad folder. We find that DSKS with the same value of B0 has a Tf of 0.890, suggesting that it is an extremely rare sequence in terms of its stability. Nevertheless, the full rank ordering creates an end member, R, of the set that is an even better folder with a Tf of 1.15. 238 Folding & Design Vol 2 No 4 Figure 1 Figure 2 DSKS DSKS' R 4 4 24 8 12 2 6 7 8 5 1 37 7 2 9 3 2 2 23 20 1 16 7 9 40 6 33 B0<–2.9111 9 3 9 1 10 5 1 28 Model native conformations for DSKS and R. The numbers indicate the relative strengths of the contacts. Native conformations for sequence DSKS′. The numbers indicate the relative strengths of the contacts. In order to characterize the equilibrium properties of the sequences, we define (following [8,13,18]) the following parameters: quantity is defined as the temperature at which Nc = 6. Figure 3 compares the phase diagrams obtained for sequences DSKS and R. The phase diagram for the former was obtained in [18] (a fuzzy character of the lines shown there was probably meant to indicate arbitrariness of the definitions of the ‘phases’). The phase diagram for R is similar in topology to that for DSKS except that all of the transitions are shifted to higher temperatures, especially from the ‘frozen’ to the ‘globule’ state. Note that the frozen state defined in [18] is a bit of a misnomer since it was defined merely in terms of the energy spectrum and without invoking any dynamics. This transition line is in fact a measure of stability which is roughly proportional to Tf and not to Tg. p0 Z P0 = X =1− (3) ∑p 2 m (4) pn pm Z2 (5) m Q= ∑Q n, m n,m Nc = ∑C m m pm Z (6) where pm = e − E m / kBT Z (7) with m = 0 corresponding to the native state. Cm is the number of contacts in conformation m and Qn,m is the number of contacts that are common to conformations n and m divided by the number of contacts in the more compact state of the two. The first three parameters are all basically equivalent measures of the thermodynamic stability and they all have sigmoidal temperature dependencies (data shown for sequences DSKS and R in Figure S1 of the Supplementary material). The rank ordering is seen to increase stability. The first three parameters define characteristic temperatures of stability by the following, quite arbitrary, criteria used in the literature: Tf is the temperature at which P0 = ½ [13], Tx is the temperature at which X = 0.8 [8], and TQ is one at which Q = 0.7 [18]. The fourth parameter is a measure of compactness. Dinner et al. [18] have introduced equilibrium phase diagrams in which lines of TQ and TN are shown. The latter Using an exact enumeration of all conformations, we obtain the value of Tf. It is important to note that an apparent Tf obtained using only the maximally compact conformations (as is usually done in three dimensions [8,9,14] with the exception of a Monte Carlo procedure used in [13]) is independent of B0 and leads to a value larger than the true Tf. This is illustrated in Figure 3 by using TQ as a measure of the stability. The plots of Tf versus B0 will be shown in the next section, together with those of Tg. The folding times In order to probe the dynamics, we have carried out Monte Carlo simulations with a Metropolis algorithm starting from a random initial conformation (our Monte Carlo simulations had move sets of Figure 1a and case i of Figure 1b of [26]). Several hundred (a minimum of 201) independent runs were carried out to determine the median folding time. (Note, e.g., that in [8,9] a temperature somewhat above the apparent Tf was used for speeding up the folding kinetics. Since noncompact conformations were not considered, and these are very large in number compared to the compact ones, the true Tf could be significantly smaller than the apparent value, especially for small B0.) Research Paper Cell dynamics of folding Cieplak and Banavar Figure 3 239 Figure 4 30 –1 –2 3 20 T 2 R tfold/104 Coil Globule DSKS R DSKS max. compact 10 DSKS 1 R 0 Frozen 0 0 –6 1 2 3 4 T –4 –2 B0 0 2 The phase diagram for sequences DSKS (solid lines) and R (broken lines) in the T–B0 plane. The more vertical lines show the transition to compactness as measured by the temperature TN. The other lines are determined by TQ and are measures of the folding temperatures. The horizontal line is for sequence DSKS and indicates TQ determined only from the conformations that are maximally compact. Plots of the median time of folding versus the temperature of the simulations are shown in Figure 4 and Figure S2 of the Supplementary material. Figure 4 compares DSKS and R at two indicated values of B0. It is seen that the rank ordering decreases the folding times at any temperature. Increasing the attraction in the contact interactions, by reducing B0, reduces the folding times at higher Ts but increases them at lower Ts. A comparison of the median folding times for R, DSKS, and DSKS′ at B0 = –1 is available in Figure S2 of the Supplementary material. In this case, the native state of the DSKS′ is not maximally compact; this does not have any dramatic impact on the dependence of the median folding times on T but the times are longer than for DSKS and even longer than for R. The smoothed out (within two sizes of the data points) plots of the median folding times for R and DSKS at two values of B0 which are indicated in the figure (–1 and –2). smaller for R than for DSKS but the impact on Tf is more significant than on Tg. The ratio Tf : Tg, plotted as an inset to Figure 5, is larger for R than for DSKS especially for B0 approaching 0. This proves R to be a champion folder. At B0 = –1, Tg = 0.49 and Tf = 1.152 for R, defining the good folding temperature range for this sequence. For DSKS, the range is [0.55,0.890]. For both sequences, then, T = 0.8 is within the proper folding range and will be used in further studies of the dynamics reported in this paper. Figure 6 shows Tf and Tg for sequence DSKS′. Tf has a dip to zero at the transition between the ground states shown in Figure 2 and the ratio Tf : Tg is much less than 1 which shows that DSKS′ is a bad folder. A plot of the median folding time versus T has a U-shape (as in [13]) characterized by a very steep increase in median times at some characteristic lower temperature. Tg was operationally defined as the value of the temperature at which the median time is 300 000 steps. Around this cutoff median time, the curve is almost vertical which makes the definition of Tg essentially independent of the value of the cutoff. We have also considered the geometric characteristics of the three sequences. The simplest of these is probably the radius of gyration, Rg. The mean square Rg versus T as obtained through an exact enumeration of the energy spectra of the three sequences is shown in Figure S3 of the Supplementary material. It is seen that, on lowering T, the radius of gyration enters a plateau region at Tf. Related information is supplied by the average number of contacts of Figure S1, but Figure S3 relates more directly to the average shape of the polymer. Monte Carlo calculations show a low-T upturn in 〈Rg2〉 which starts at Tg and indicates a loss of ergodicity within the timescale of the simulation. Figure 5 shows a summary of our results for Tf and Tg for DSKS and R. The results indicate that Tf is larger and Tg is An interesting way to describe the departures of the sequence geometry from its native conformation is 240 Folding & Design Vol 2 No 4 Figure 5 Figure 6 3 3 Tf : Tg 6 DSKS Tg DSKS' R 4 DSKS 6 Tg Tf : Tg 4 R 2 2 0 T 2 2 –4 –2 0 Tf 1 0 T –4 –2 0 1 R Tf DSKS 0 –6 –4 –2 0 2 0 –6 –4 The plots of Tf and Tg as a function of B0 corresponding to the sequences DSKS and R as indicated (the solid lines correspond to DSKS). The inset shows the ratio Tf : Tg as a function of B0. ∑ where r Nij refers to the coordinates of the native state, d(x) is the Kronecker delta function, and N is the number of monomers. A susceptibility-like parameter is defined as: − χs 0 2 The plots of Tf and Tg as a function of B0 corresponding to the sequence DSKS′. The inset shows the ratio Tf : Tg as a function of B0. through the structural overlap function, defined by Camacho and Thirumalai [10] as: 1 χs = 1 − 2 δ (rij − r ijN) (8) N − 3N + 2 i ≠ j , j ±1 χ = χ2s –2 B0 B0 2 (9) where the angular brackets indicate a thermodynamic average, i.e. over the Monte Carlo trajectory. Figure 7 shows the temperature dependence of x together with that of the specific heat, c (the fluctuation in energy divided by T2 and by the number of monomers). Both quantities are calculated by an exact enumeration. As a signature of a good folder, Klimov and Thirumalai [11,12] have proposed small values of a parameter s = (Tc – Tx)/Tc, where Tc is the temperature at which c reaches its maximum and Tx is a similar temperature for x. Thus, for good folders, the peaks in c and x ought to almost coincide. The plots of the two quantities in Figure 7 indeed have the peaks coinciding in the case of sequences R and DSKS but not for DSKS′. Thus good folders behave like ferromagnets, in which peaks in the magnetic susceptibility and specific heat coincide. Bad folders, on the other hand, behave like spin glasses, in which a cusp in the susceptibility lies below a typically broad maximum in the specific heat (e.g. see [27]). The cell dynamics We now proceed to describe our new way to represent the dynamics of the model protein in a coarse-grained fashion. The simplest characterization of the dynamics is given by Tg. This information, however, is too sparse to discuss mechanisms of the emerging folding funnels. Information about all of the states that turn up in a Monte Carlo trajectory, on the other hand, is too bulky and too amorphous to see patterns. A compromise is offered by a coarse-grained mapping of the Monte Carlo states to a class of important states that are much fewer in number [19,20]. Since the native state in the good folding sequences is maximally compact, the physically relevant choice of the class is that consisting of the maximally compact cells. In the case of the bad folder DSKS′, the native cell need not coincide with the native conformation (it does for B0 < –2.9111), but it is defined as the maximally compact state that is closest, in terms of contacts, to the native state (note that when the native state is one shown on the right-hand side of Figure 2, the corresponding native cell is not the conformation shown on the left-hand side of this figure). Figure 8 shows the two-step scheme that we adopted to map a given conformation to a cell. The conformation on the left takes on the form in the middle on using a steepest-descent Research Paper Cell dynamics of folding Cieplak and Banavar Figure 7 Figure 8 ↓Tg Tf↓ 1.5 R 49 c 1.0 22 χ 0.5 11 17 0.0 ↓Tg 1.5 c, 10χ 241 ↓Tf 22 5 DSKS c 1.0 19 30 χ 0.5 5 37 7 17 19 33 18 30 29 12 0.0 ↓Tf 1.5 30 ↓Tg DSKS' E=–2.85 1.0 E=–9.18 E=–12.51 χ c 0.5 0.0 15 0 1 2 3 The mapping of a conformation to a cell as described in the text. The numbers indicate the ranking of the interaction energies in the contacts. The larger numbers correspond to contacts that are common between the quenched and the cell states. The energies indicated are for sequence R at B0 = –1. T The specific heat and fluctuation in the parameter of structural stability (multiplied by 10) for the three sequences as indicated. The solid lines refer to x and the dotted lines to c. quenching procedure to a local minimum. Of the 69 maximally compact conformations, the conformation on the right is found to have maximum energy overlap in common contacts with the middle picture and the mapping to the cell is complete. It should be noted that the first step of the transformation necessarily leads to an intermediate state which is lower in energy than the original state. The second step, however, may map to a maximally compact state of energy higher than that of the intermediate state. Overall, the transformation maps the Monte Carlo states primarily into lower-energy segments of the phase space. All states that map to a given cell can be considered to be excitations (like phonons in a solid) belonging to the cell but some of these ‘excitations’ might be of a negative energy. In all of the cases we tested, the cells were kinetically accessible from their excitations in a small number of steps. A train of energies of the conformations appearing in the dynamics of sequence R at T = 0.8 during a segment of a Monte Carlo run of 10 000 steps starting from a random configuration are shown in Figure 9. These energies display considerable variation and rapid changes. This figure also shows the corresponding energies in cell space. The system is mapped to states in which it ‘stays’ for a considerable amount of time, especially in the native cell. Figure S4 of the Supplementary material shows a segment of 10 000 Monte Carlo steps in cell space at two temperatures (0.5 and 0.8), one at Tg and the other between Tg and Tf, for sequence R. The native cell has less occupancy at T = 0.5 with fewer dynamic contacts with the other cells than at T = 0.8 showing the lack of equilibration, a breakdown of ergodicity, and the difficulty of rapid folding at the lower temperature. The most striking characteristic of the folding propensity is captured by the average occupancy of the native cell, C0. Figure 10 shows C0, as a function of temperature for the three sequences R, DSKS, and DSKS′ at B0 = –1. The dotted and broken lines indicate results obtained at a fixed number of Monte Carlo steps indicated in the figure. The solid line, denoted by F, corresponds to the data points collected from only the portion of the run until the native state was reached. Below Tg, indicated by the arrow, only the first 2 million steps are taken into account. For sequence R, there is a broad and elevated plateau in C0. For DSKS, the region of significant values of C0 is reduced, and for the bad folder, it is absent. After folding, 242 Folding & Design Vol 2 No 4 Figure 9 R 0 T=0.8 MC Motion in the conformation space (the top panel) and in the coarse-grained cell space (the bottom panel) of a typical 10 000 step long Monte Carlo trajectory for sequence R at T = 0.8. The energy levels of the 69 cells are shown on the right. –10 –20 E 0 Cell –10 –20 0 2000 4000 6000 8000 t the system stays predominantly in the native cell accounting for the (indefinite) sharpening of the curves as the number of steps considered increases. What is defined uniquely then and what is thus physical is C0 determined only from the folding stage of the evolution — curves F. If one restricts oneself further to a substantial portion of the folding stage, C0 is changed very little, which indicates the robustness of the definition. The cell-to-cell connections for sequence R are shown in Figure 11 at temperatures 2.0, 0.8, and 0.3 — i.e. above Tf, between Tg and Tf, and below Tg, respectively. The data refer to the folding stage of the evolution — the runs proceeded until folding — and are based on 200 starting configurations. The columns and rows denote the cell number arranged according to the energy, with 1 corresponding to the native cell. The diagonal entries show the average occupancy of the cells during the runs. These entries, when summed over all 69 cells, are normalized to 1000. For the off-diagonal terms, only successful transitions to a different cell are considered. They show the interlinking between different cells. The sum of all off-diagonal entries is normalized to 1000 and the most significant entries, those bigger than or equal to 10, are displayed. Only those diagonal entries are shown that are related to substantial off-diagonal matrix elements. The matrix is found to be approximately symmetric (when further digits of the entries, than shown in Figure 11, are considered). The situation corresponding to the best folding is illustrated by the matrix calculated for T = 0.8 — the lowenergy cells are directly connected to the native cell and the corresponding transition rates are high. The pattern of direct connectedness to the native cell is also visible in a bigger matrix corresponding to T = 0.8 when the off-diagonal cutoff is reduced to 5, as shown in Figure 12. At T = 2.0, the most significant transitions are from higher lying cells directly to the native cell, but most of the other cells are also present in the dynamics and the native state has low occupancy. Below Tg, on the other hand, the pathways to folding are primarily through intermediate stages, starting from cells of high energy; in addition, there is a significant weight in transitions between cells that do not have any substantial links to the native cell. Research Paper Cell dynamics of folding Cieplak and Banavar Figure 10 Figure 11 R 107 1.0 R 1 7 T=2.0 F 0.5 ↑Tg ↑Tf 13 50 14 16 19 21 23 32 30 36 18 0.0 107 1.0 7 14 19 36 1 292 13 15 21 18 105 DSKS 1 2 3 4 6 7 8 17 20 1 678 21 24 20 29 13 60 22 10 105 C0 243 0.5 2 21 57 F 3 24 ↑Tg ↑Tf 0.0 T=0.8 ↓Tf 0.5 24 4 20 24 DSKS' 1.0 24 27 6 29 ↓Tg 7 13 F 8 60 10 13 20 17 22 105 0.0 0 8 20 10 1 2 3 3 T 1 3 7 10 19 20 23 29 43 1 544 34 20 Occupancy of the native cell as a function of T, as explained in the text. The results for 100 000 steps and for the folding stage (F) are based on 200 starting conformations. For the case with 10 million steps, we used one starting conformation. The values of Tg and Tf are indicated by the arrows. 3 16 7 34 68 54 10 This pattern of behavior is generic — it is also observed for the DSKS sequence, with less direct connectivity to the native cell. This is shown in Figure S5 of the Supplementary material. The bad folder of Figure 13, on the other hand, is characterized by a predominance of the hierarchical steps and/or transitions between cells of high energy at any temperature. The cell-to-cell connectivity is observed to depend primarily on the value of the temperature relative to Tg and Tf. Our description based on 69 cells suggests that, at least in the two-dimensional systems, the pathways to folding are through a direct linking of a significant portion of viable cells to the native cell. This cluster of well-connected cells can be considered to be a funnel for folding. Similar patterns of behavior are expected to be found in three-dimensional systems, for which the set of relevant cell states needs to be determined in an approximate fashion. A related study to ours on folding pathways in a threedimensional lattice model has been carried out by T=0.3 57 53 19 2 20 20 23 29 43 11 74 57 68 28 3 11 14 10 74 10 18 28 8 Cell-to-cell connectivity of sequence R at three temperatures as described in the text. The cutoff value for the off-diagonal entries is 10. Leopold et al. [4]. They considered two sequences, one of which was a good folder and the other not, and explored the mean first passage time for diffusion between maximally compact states. Their principal finding was that the good folder had a folding funnel through which kinetic access to the native state was facilitated. In contrast, in our two-dimensional study, we map each conformation of 244 Folding & Design Vol 2 No 4 Figure 12 R 1 2 3 4 5 6 7 8 9 T=0.8 11 14 16 17 19 20 23 29 35 36 37 1 2 3 4 5 6 7 8 9 11 14 16 17 19 20 23 29 35 36 37 678 21 24 20 6 29 13 60 8 8 8 22 7 10 8 8 6 3 21 57 24 24 24 27 20 24 8 6 29 10 7 13 13 20 60 7 9 3 8 8 5 8 11 8 7 22 6 7 3 10 13 8 8 8 10 5 8 3 6 6 3 7 Figure 13 1 DSKS' 5 1 206 15 T=2.0 5 15 54 8 17 13 24 22 12 23 10 40 10 1 T=0.8 40 16 24 12 10 10 36 37 41 18 4 5 6 7 17 17 9 13 11 16 24 58 44 6 13 43 28 7 the heteropolymer to one of the maximally compact cells and study the dynamics of evolution in this coarsegrained cell space. 22 23 66 4 5 13 63 1 274 Cell-to-cell connectivity of sequence R at T = 0.8 with the cutoff value for the off-diagonal entries equal to 5. 8 24 9 11 13 16 22 13 95 14 14 30 Conclusions Our studies suggest that champion folders have a large Tf : Tg ratio, the cell motion is qualitatively different at different temperatures and for different sequences, the folding pathways in cell space involve substantial direct links to the native cell, and the parameter σ is small for good folders due to a dominance of the native cell in the averages of the structural stability. The champion folders are obtained by rank ordering of the contact energies and by assigning the most attractive couplings to the contacts present in a target protein. The extent to which biologically relevant proteins actually follow the rank ordering prescription remains to be elucidated. Nevertheless, rankordered sequences form end members of the class of rapidly folding model proteins. The cell mapping strategy proposed here can be viewed as an extension of the idea of Leopold et al. [4]. In our approach, a folding funnel is identified by the existence of significantly connected paths leading to the native cell. 1 2 4 1 117 T=0.3 34 12 4 12 16 25 6 9 6 25 2 5 5 13 14 85 68 13 21 17 96 12 14 187 40 14 21 11 13 14 86 85 11 13 9 187 17 24 11 Acknowledgements We are grateful to Saraswathi Vishveshwara for collaboration and discussions. This work was supported by Komitet Badan Naukowych (Poland; Grant Number 2P302 127), National Aeronautics and Space Administration, National Science Foundation, the Petroleum Research Fund administered by the American Chemical Society, and the Center for Academic Computing at Penn State. Cell-to-cell connectivity of sequence DSKS′ at three temperatures as described in the text. The cutoff value for the off-diagonal entries is 10. Research Paper Cell dynamics of folding Cieplak and Banavar 245 References 1. Anfinsen, C.B., Haber, E., Sela, M. & White, F.H. (1961). The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc. Natl Acad. Sci. USA 47, 1309–1314. 2. Bryngelson, J.D., Onuchic, J.N., Socci, N.D. & Wolynes, P.G. (1995). Funnels, pathways and the energy landscape of protein folding: a synthesis. Proteins 21, 167–195. 3. Levinthal, C. (1968). Are there pathways for protein folding? J. Chem. Phys. 65, 44–45. 4. Leopold, P.E., Montal, M. & Onuchic, J.N. (1992). Protein folding funnels: a kinetic approach to the sequence–structure relationship. Proc. Natl Acad. Sci. USA 89, 8721–8725. 5. Onuchic, J.N., Wolynes, P.G., Luthey-Schulten, Z. & Socci, N.D. (1995). Toward an outline of the topography of a realistic protein folding funnel. Proc. Natl Acad. Sci. USA 92, 3626–3630. 6. Bryngelson, J.D. & Wolynes, P.G. (1989). Intermediates and barrier crossing in a random energy model (with applications to protein folding). J. Phys. Chem. 93, 6902–6915. 7. Bryngelson, J.D. & Wolynes, P.G. (1987). Spin glasses and the statistical mechanics of protein folding. Proc. Natl Acad. Sci. USA 84, 7524–7528. 8. Šali, A., Shakhnovich, E. & Karplus, M. (1994). How does a protein fold? Nature 369, 248–251. 9. Karplus, M., Šali, A. & Shakhnovich, E. (1995). Kinetics of proteinfolding — reply. Nature 373, 665–665. 10. Camacho, C.J. & Thirumalai, D. (1993). Kinetics and thermodynamics of folding in model proteins. Proc. Natl. Acad. Sci. USA 90, 6369–6372. 11. Klimov, D.K. & Thirumalai, D. (1996). Criterion that determines the foldability of proteins. Phys. Rev. Lett. 76, 4070–4073. 12. Veitshans, T., Klimov, D. & Thirumalai D. (1996). Protein folding kinetics: timescales, pathways and energy landscapes in terms of sequence-dependent properties. Fold. Des. 2, 1–22. 13. Socci, N.D. & Onuchic, J.N. (1994). Folding kinetics of proteinlike heteropolymers. J. Chem. Phys. 101, 1519–1528. 14. Shrivastava, I., Vishveshwara, S., Cieplak, M., Maritan, A. & Banavar, J. R. (1995). Lattice model for rapidly folding protein-like heteropolymers. Proc. Natl Acad. Sci. USA 92, 9206–9209. 15 Cieplak, M., Vishveshwara, S. & Banavar, J.R. (1996). Cell dynamics of model proteins. Phys. Rev. Lett. 77, 3681–3684. 16. Go, N. & Abe, H. (1981). Non-interacting local-structure model of folding and unfolding transition in globular proteins. II Application to two-dimensional lattice proteins. Biopolymers 20, 1013–1031. 17. Lau, K.F. & Dill, K.A. (1989). A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22, 3986–3997. 18. Dinner, A., Šali, A., Karplus, M. & Shakhnovich, E. (1994). Phase diagram of a model protein-derived by exhaustive enumeration of the conformations. J. Chem. Phys. 101, 1444–1451. 19. Stillinger, F.H. & Weber, T.A. (1983). Dynamics of structural transitions in liquids. Phys. Rev. A 28, 2408–2416. 20. Stillinger, F.H. & Weber, T.A. (1984). Packing structures and transitions in liquids and solids. Science 225, 983–989. 21. Cieplak, M. & Jaeckle, J. (1987). Hidden valley structure of Ising spin glasses. Z. Phys. 66, 325–332. 22. Dill, K.A., et al., & Chan, H.S. (1995). Principles of protein folding—a perspective from simple exact models. Protein Sci. 4, 561–602. 23. Chan, H.S. & Dill, K.A. (1993). The protein folding problem. Phys. Today 46, 24–32. 24. Kauzmann, W. (1959). Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1–63. 25. Miyazawa, S. & Jernigan, R.L. (1985). Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18, 534–552. 26. Chan, H.S. & Dill, K.A. (1994). Transition states and folding dynamics of proteins and heteropolymers. J. Chem. Phys. 100, 9239–9257. 27. Binder, K. & Young, A.P. (1986). Spin-glasses: experimental facts, theoretical concepts, and open questions. Rev. Mod. Phys. 58, 801–976. Because Folding & Design operates a ‘Continuous Publication System’ for Research Papers, this paper has been published via the internet before being printed. The paper can be accessed from http://biomednet.com/cbiology/fad.htm — for further information, see the explanation on the contents pages.