Download Cell dynamics of folding in two

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Protein moonlighting wikipedia , lookup

Protein wikipedia , lookup

Biochemical switches in the cell cycle wikipedia , lookup

Cytosol wikipedia , lookup

Cell encapsulation wikipedia , lookup

Extracellular matrix wikipedia , lookup

Cellular differentiation wikipedia , lookup

Cell culture wikipedia , lookup

Endomembrane system wikipedia , lookup

Cell cycle wikipedia , lookup

Cell growth wikipedia , lookup

Mitosis wikipedia , lookup

Organ-on-a-chip wikipedia , lookup

Signal transduction wikipedia , lookup

Amitosis wikipedia , lookup

Cytokinesis wikipedia , lookup

Protein domain wikipedia , lookup

Folding@home wikipedia , lookup

List of types of proteins wikipedia , lookup

Protein folding wikipedia , lookup

Transcript
Research Paper
235
Cell dynamics of folding in two-dimensional model proteins
Marek Cieplak1 and Jayanth R Banavar2
Background: Functionally useful proteins are sequences of amino acids that fold
rapidly under appropriate conditions into their native states. It is believed that
rapid folders are sequences for which the folding dynamics entail the exploration
of restricted conformations — the phase space can be thought of as a folding
funnel. While there are many experimentally accessible predictions pertaining to
the existence of such funnels and a coherent picture of the kinetics of folding has
begun to emerge, there have been relatively few simple studies in the controlled
setting of well-characterized lattice models.
Results: We design rapidly folding sequences by assigning the strongest
couplings to the contacts present in a target native state in a two-dimensional
model of heteropolymers. Such sequences have large folding transition
temperatures and low glass transition temperatures. The dependence of median
folding times on temperature is investigated. The pathways to folding and their
dependence on the temperature are illustrated via a study of the cell
dynamics — a mapping of the dynamics into motion within the space of the
maximally compact cells.
Conclusions: Folding funnels can be defined operationally in a coarse-grained
sense by mapping the states of the system into maximally compact
conformations and then by identifying significant connectivities between them.
Introduction
Globular proteins fold to their biologically active native
state in a proper range of temperature and acidity of a
water solution [1]. The native state is assumed to be the
ground state of the system [1] and this identification will
be adopted in this paper. The dynamics of folding is akin
to motion in a rugged free energy landscape [2]. The relative rapidity of the process, of the order of milliseconds to
seconds, rules out an exhaustive search of conformations,
which would have to take longer than the age of the universe [3] even in the case of short proteins. The notion of
folding funnels [4] has been invoked to describe the
apparent emergence of a restricted set of conformations
that are actually involved in folding. Based on simulations
of a lattice model, Onuchic et al. [5] have identified the
crucial landmarks in the folding funnel for a fast-folding
protein — through the molten globule states and lowenergy bottlenecks. There are several features that have
been proposed to characterize sequences that are good
folders: maximal compatibility (‘minimum frustration’)
[6,7], large thermodynamic stability [2,8,9], and certain
properties of thermodynamic parameters such as the specific heat and the structural overlap function [10–12]. The
purpose of this paper is to study what makes a good folder
and how one may identify the funnel conformations in
model two-dimensional lattice proteins.
The thermodynamic stability of a protein can be characterized by a folding transition temperature Tf below which
Addresses: 1Institute of Physics, Polish Academy
of Sciences, 02-668 Warsaw, Poland.
2Department of Physics and Center for Materials
Physics, 104 Davey Laboratory, The Pennsylvania
State University, University Park, PA 16802, USA.
Correspondence: Marek Cieplak
E-mail: [email protected]
Key words: folding funnel, lattice models, protein
design, protein folding
Received: 07 Apr 1997
Revisions requested: 09 May 1997
Revisions received: 19 May 1997
Accepted: 29 May 1997
Published: 08 Jun 1997
Electronic identifier: 1359-0278-002-00235
Folding & Design 08 Jun 1997, 2:235–245
© Current Biology Ltd ISSN 1359-0278
the probability of occupancy of the native state, P0,
becomes substantial. This may be assessed by several
measures that we discuss later. The most transparent definition is to take Tf as a temperature at which P0 is ½ [13].
Note that Tf depends on the energy spectrum of a
sequence but it does not depend on the dynamics. A
quantity that is related to the dynamics of folding is the
temperature of the glass transition, Tg, below which the
protein is trapped in local energy minima and there is typically no accessibility, within measurement timescales, of
the native state from a randomly chosen conformation
even when the temperature is less than Tf [2,13]. Thus it
follows that proteins that are functionally useful have a
large Tf and a small Tg and the temperature range relevant
to folding is Tg < T < Tf. Good folders have a large Tf : Tg
ratio whereas bad folders are characterized by this ratio
being less than 1.
We have proposed recently [14,15] a method to design
champion folders: for a given set of interaction contact
energies the prescription is to rank order these energies
and then to assign the strongest attractive energies to contacts that are present in the target native conformation.
This method is a generalization of the approach taken by
Go and Abe [16] in which the native contacts are set to be
uniformly attractive whereas the non-native contacts do
not interact. The procedure of contact assignment through
rank ordering defines a champion folder or a candidate for
being the best member within the family of sequences
236
Folding & Design Vol 2 No 4
that have a desired ground state and fold well. Note that
there are a large number of such fully rank-ordered
sequences for a given set of interaction energies and a
target maximally compact native state. We do not attempt
to distinguish between these degenerate sequences in this
paper. The consequences of rank ordering have been
demonstrated within the framework of simple yet illuminating three-dimensional [14] and two-dimensional [15]
lattice models.
In this paper, we focus on the two-dimensional model of a
protein consisting of a 16 monomer self-avoiding chain on
a square lattice [17,18]. The total number of conformations is only 802 075, of which 69 are maximally compact
(henceforth we denote the maximally compact conformation as a cell), i.e. fit on a 4 × 4 lattice, and is amenable to
exact enumeration. This allows for an exact evaluation of
equilibrium parameters, such as Tf or the radius of gyration. We compare folding characteristics for three
sequences: the champion folder generated through the
rank ordering, a good folder and a bad folder. Equilibrium
properties of the latter two sequences were obtained in
[18]. We start by presenting the equilibrium phase diagrams for the three sequences. We then proceed to studies
of dynamics as defined through a Monte Carlo process.
We determine the behavior of folding times as a function
of temperature, T, and determine Tg.
We then go on to characterize the dynamics of folding
within a coarse-grained scheme [15] and compare it for
the three sequences. Each conformation of the heteropolymer is dynamically mapped onto one of the 69
cells. This is done by a steepest-descent quenching followed by determining the cell with the largest energetic
overlap through common contacts. Thus, the true dynamics is represented in a coarse-grained manner as motion
within the space of the 69 cells. The motivation for the
steepest-descent part of the mapping is supplied by the
pioneering work of Stillinger and Weber [19,20] in the
context of glasses; see also a related work on spin glasses
[21]. Strikingly different patterns in the cell dynamic
space are found as the temperature is varied, illustrating
the pathways to folding.
E=
∑B
i, jD(i -
j)
(1)
i< j
where Bijs are the values of the contact interactions. We
take:
~
Bij = Bij + B0
(2)
The contact interactions are effective couplings which
result from the hydrophobic interactions of the amino
acids with solvent [24]. Their values
~ can be estimated
from microscopic principles [25] and Bij can be regarded as
being governed, approximately, by a Gaussian probability
distribution [8,9,18]. B0 is an overall shift (many of our
results will focus on B0 = –1).
For a two-dimensional model consisting of 16 monomers,
there are 49 possible contact interactions. Each sequence
is defined by a particular choice of the values of these
interactions.
~ As the first set we consider the 49 interaction
energies Bij of Dinner et al. [18] (see their Table I, upper
right) — these were obtained as independent random
values with a Gaussian distribution of zero mean and unit
dispersion (thus defining the unit of temperature). These
interaction energies are also shown in Table 1 of this
paper and are merely an approximation to more realistic
interaction energies between the amino acids of a real
protein. We denote this sequence DSKS (after Dinner,
Šali, Karplus and Shakhnovich [18], who demonstrated
that this sequence had a very stable ground state). The
ground state of the heteropolymer chain for any B0 < 0 is
the conformation in Figure 1.
The heteropolymer sequences and their equilibrium
properties
In order to define the second sequence, R, we rank order
∼
the 49 B ij values in a decreasing order of attraction. The
numbers shown in Figure 1 indicate the rank of the interaction energy (a smaller number denotes a higher attraction). The family of champion folders with the same
native state is designed by assigning the nine most attractive interactions to the nine native contacts (see [14]; their
Figure 2 shows, inadvertently, the number of native contacts and not the number of all contacts). One way (of the
nine) of doing this is shown by sequence R with the native
state, as shown in Figure 1, which differs from DSKS in
only three contacts. The remaining 40 interactions are
assigned randomly to the non-native contacts. The whole
list of these interactions, together with their ranking, is
given in Table 1.
Since proteins comprise combinations of up to 20 amino
acids, they may be conveniently and simply modelled as
heteropolymers (for reviews, see [22,23]). Each monomer
represents an amino acid. Two monomers, i and j, are in
contact with a contact energy Bij if they are not successive
in sequence and yet are next to each other (a negative Bij
denotes attraction) [22,23]. This restriction on monomers i
and j is schematically denoted by D(i–j) and the energy of
the system is written as:
The third sequence, DSKS′, also comes from Dinner et
al. [18] (see their Table I, lower left) and is also shown in
Table 1 of this paper. Its native conformation depends
on B0. Figure 2 shows two of these: one, maximally
compact, for large negative B0s and the other, with eight
contacts, for –2.9111 < B0 < 0. For B0 = 0 the native state
has seven contacts, and for larger B0 the number of contacts is still smaller.
Results and discussion
Research Paper Cell dynamics of folding Cieplak and Banavar
237
Table 1
~
Values of the Bij couplings for the three sequences DSKS, R, and DSKS¢.
i
j
DSKS
Rank
R
Rank
DSKS′
Rank
1
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
4
4
4
4
4
5
5
5
5
6
6
6
6
7
7
7
7
8
8
8
9
9
9
10
10
11
11
12
13
4
6
8
10
12
14
16
5
7
9
11
13
15
6
8
10
12
14
16
7
9
11
13
15
8
10
14
16
9
11
13
15
10
12
14
16
11
13
15
12
14
6
13
15
14
16
15
16
–0.4923
1.0474
–0.5496
1.1895
–2.1095
–0.7512
–0.0469
1.3830
–1.2977
–0.1408
0.0940
0.9985
–1.8114
–0.1705
1.1223
–0.3847
–0.2865
0.3293
–1.0435
–1.0519
0.5629
0.3252
–0.5998
1.7539
–0.2118
1.1253
0.0326
0.3624
–1.6053
0.4711
0.0228
0.3899
0.2471
–1.3267
–1.4820
–0.3501
–0.1171
–0.2125
–0.5015
0.0594
0.6888
–0.2472
0.5166
1.1651
–0.2958
–1.7203
–0.0592
0.3877
14
42
12
46
1
10
26
47
7
23
30
41
2
22
43
15
18
33
9
8
39
32
11
48
21
44
28
34
4
37
27
36
31
6
5
16
24
20
13
29
40
19
38
45
17
3
25
35
1.8313
–0.1705
–1.4820
0.0326
–2.1095
–1.7203
1.1651
0.3624
–1.2977
0.6888
–0.7512
1.7539
–1.8114
–0.2958
0.3252
1.1253
0.9985
1.1223
–1.0435
–1.0519
0.3899
–0.2118
–0.5015
–0.1171
–0.5998
0.5166
0.0228
–0.2125
–1.6053
0.3293
0.2471
–0.0469
–0.2472
1.1895
–0.1408
0.5629
–1.3267
1.3830
–0.0592
0.4711
–0.4923
–0.3847
1.0474
–0.2865
0.0940
–0.5496
0.3877
–0.3501
49
22
5
28
1
3
45
34
7
40
10
48
2
17
32
44
41
43
9
8
36
21
13
24
11
38
27
20
4
33
31
26
19
46
23
39
6
47
25
37
14
15
42
18
30
12
35
16
1.9063
–0.8975
0.0770
1.6883
–0.6494
0.8742
0.2781
–0.9914
–0.6782
1.1379
0.0821
0.5539
–0.6158
–0.9801
–0.0625
–0.6560
–0.3041
–0.7576
–0.2991
–0.3037
0.2299
–0.9717
–0.8744
–0.2275
–0.0651
1.5177
–0.9366
–1.5168
–1.0613
–0.8259
–1.7790
0.7045
–0.4063
0.8901
0.3860
–0.8012
1.3644
1.3071
0.3967
0.6439
0.8944
0.1926
–1.5915
–0.7950
0.2272
–1.1363
0.6108
–0.3228
49
10
28
48
18
40
33
6
16
43
29
36
19
7
27
17
22
15
24
23
32
8
11
25
26
47
9
3
5
12
1
39
20
41
34
13
45
44
35
38
42
30
2
14
31
4
37
21
~
Values of the Bij couplings (see equations 1 and 2), for the three sequences DSKS, R, and DSKS′ studied in this paper. The rank of a coupling is
indicated immediately to the right of its value. For B0 = 0, rank 1 corresponds to the strongest attraction and rank 49 to the strongest repulsion.
The bead hidden inside the conformation shown in Figure 1 is counted as the first bead.
In order to assess whether DSKS and DSKS′ are
~ typical
samples, we have generated 200 independent Bij Gaussian interaction sets randomly and assigned them randomly to the contacts. We calculated the Tf values for
each of them for B0 = –1. The values of Tf ranged
between 0.005 and 0.789 with an average of 0.289 and
dispersion 0.153. The sequence DSKS′ has a Tf of 0.195
and is thus typical. We shall demonstrate later on that
this sequence is a bad folder. We find that DSKS with
the same value of B0 has a Tf of 0.890, suggesting that it
is an extremely rare sequence in terms of its stability.
Nevertheless, the full rank ordering creates an end
member, R, of the set that is an even better folder with a
Tf of 1.15.
238
Folding & Design Vol 2 No 4
Figure 1
Figure 2
DSKS
DSKS'
R
4
4
24
8
12
2
6
7
8
5
1
37
7
2
9
3
2
2
23
20
1
16
7
9
40 6
33
B0<–2.9111
9
3
9
1
10
5
1
28
Model native conformations for DSKS and R. The numbers indicate the
relative strengths of the contacts.
Native conformations for sequence DSKS′. The numbers indicate the
relative strengths of the contacts.
In order to characterize the equilibrium properties of the
sequences, we define (following [8,13,18]) the following
parameters:
quantity is defined as the temperature at which Nc = 6.
Figure 3 compares the phase diagrams obtained for
sequences DSKS and R. The phase diagram for the
former was obtained in [18] (a fuzzy character of the lines
shown there was probably meant to indicate arbitrariness
of the definitions of the ‘phases’). The phase diagram for
R is similar in topology to that for DSKS except that all of
the transitions are shifted to higher temperatures, especially from the ‘frozen’ to the ‘globule’ state. Note that the
frozen state defined in [18] is a bit of a misnomer since it
was defined merely in terms of the energy spectrum and
without invoking any dynamics. This transition line is in
fact a measure of stability which is roughly proportional to
Tf and not to Tg.
p0
Z
P0 =
X =1−
(3)
∑p
2
m
(4)
pn pm
Z2
(5)
m
Q=
∑Q
n, m
n,m
Nc =
∑C
m
m
pm
Z
(6)
where
pm = e
− E m / kBT
Z
(7)
with m = 0 corresponding to the native state. Cm is the
number of contacts in conformation m and Qn,m is the
number of contacts that are common to conformations n
and m divided by the number of contacts in the more
compact state of the two.
The first three parameters are all basically equivalent
measures of the thermodynamic stability and they all have
sigmoidal temperature dependencies (data shown for
sequences DSKS and R in Figure S1 of the Supplementary material). The rank ordering is seen to increase stability. The first three parameters define characteristic
temperatures of stability by the following, quite arbitrary,
criteria used in the literature: Tf is the temperature at
which P0 = ½ [13], Tx is the temperature at which X = 0.8
[8], and TQ is one at which Q = 0.7 [18]. The fourth parameter is a measure of compactness.
Dinner et al. [18] have introduced equilibrium phase diagrams in which lines of TQ and TN are shown. The latter
Using an exact enumeration of all conformations, we
obtain the value of Tf. It is important to note that an apparent Tf obtained using only the maximally compact conformations (as is usually done in three dimensions [8,9,14]
with the exception of a Monte Carlo procedure used in
[13]) is independent of B0 and leads to a value larger than
the true Tf. This is illustrated in Figure 3 by using TQ as a
measure of the stability. The plots of Tf versus B0 will be
shown in the next section, together with those of Tg.
The folding times
In order to probe the dynamics, we have carried out
Monte Carlo simulations with a Metropolis algorithm
starting from a random initial conformation (our Monte
Carlo simulations had move sets of Figure 1a and case i of
Figure 1b of [26]). Several hundred (a minimum of 201)
independent runs were carried out to determine the
median folding time. (Note, e.g., that in [8,9] a temperature somewhat above the apparent Tf was used for speeding up the folding kinetics. Since noncompact
conformations were not considered, and these are very
large in number compared to the compact ones, the true Tf
could be significantly smaller than the apparent value,
especially for small B0.)
Research Paper Cell dynamics of folding Cieplak and Banavar
Figure 3
239
Figure 4
30
–1
–2
3
20
T
2
R
tfold/104
Coil
Globule
DSKS
R
DSKS
max. compact
10
DSKS
1
R
0
Frozen
0
0
–6
1
2
3
4
T
–4
–2
B0
0
2
The phase diagram for sequences DSKS (solid lines) and R (broken
lines) in the T–B0 plane. The more vertical lines show the transition to
compactness as measured by the temperature TN. The other lines are
determined by TQ and are measures of the folding temperatures. The
horizontal line is for sequence DSKS and indicates TQ determined only
from the conformations that are maximally compact.
Plots of the median time of folding versus the temperature
of the simulations are shown in Figure 4 and Figure S2 of
the Supplementary material. Figure 4 compares DSKS and
R at two indicated values of B0. It is seen that the rank
ordering decreases the folding times at any temperature.
Increasing the attraction in the contact interactions, by
reducing B0, reduces the folding times at higher Ts but
increases them at lower Ts. A comparison of the median
folding times for R, DSKS, and DSKS′ at B0 = –1 is available in Figure S2 of the Supplementary material. In this
case, the native state of the DSKS′ is not maximally
compact; this does not have any dramatic impact on the
dependence of the median folding times on T but the
times are longer than for DSKS and even longer than for R.
The smoothed out (within two sizes of the data points) plots of the
median folding times for R and DSKS at two values of B0 which are
indicated in the figure (–1 and –2).
smaller for R than for DSKS but the impact on Tf is more
significant than on Tg. The ratio Tf : Tg, plotted as an inset
to Figure 5, is larger for R than for DSKS especially for B0
approaching 0. This proves R to be a champion folder. At
B0 = –1, Tg = 0.49 and Tf = 1.152 for R, defining the good
folding temperature range for this sequence. For DSKS,
the range is [0.55,0.890]. For both sequences, then, T = 0.8
is within the proper folding range and will be used in
further studies of the dynamics reported in this paper.
Figure 6 shows Tf and Tg for sequence DSKS′. Tf has a dip
to zero at the transition between the ground states shown
in Figure 2 and the ratio Tf : Tg is much less than 1 which
shows that DSKS′ is a bad folder.
A plot of the median folding time versus T has a U-shape
(as in [13]) characterized by a very steep increase in
median times at some characteristic lower temperature. Tg
was operationally defined as the value of the temperature
at which the median time is 300 000 steps. Around this
cutoff median time, the curve is almost vertical which
makes the definition of Tg essentially independent of the
value of the cutoff.
We have also considered the geometric characteristics of
the three sequences. The simplest of these is probably the
radius of gyration, Rg. The mean square Rg versus T as
obtained through an exact enumeration of the energy
spectra of the three sequences is shown in Figure S3 of the
Supplementary material. It is seen that, on lowering T, the
radius of gyration enters a plateau region at Tf. Related
information is supplied by the average number of contacts
of Figure S1, but Figure S3 relates more directly to the
average shape of the polymer. Monte Carlo calculations
show a low-T upturn in 〈Rg2〉 which starts at Tg and indicates
a loss of ergodicity within the timescale of the simulation.
Figure 5 shows a summary of our results for Tf and Tg for
DSKS and R. The results indicate that Tf is larger and Tg is
An interesting way to describe the departures of the
sequence geometry from its native conformation is
240
Folding & Design Vol 2 No 4
Figure 5
Figure 6
3
3
Tf : Tg
6
DSKS
Tg
DSKS'
R
4
DSKS
6
Tg
Tf : Tg
4
R
2
2
0
T
2
2
–4
–2
0
Tf
1
0
T
–4
–2
0
1
R
Tf
DSKS
0
–6
–4
–2
0
2
0
–6
–4
The plots of Tf and Tg as a function of B0 corresponding to the
sequences DSKS and R as indicated (the solid lines correspond to
DSKS). The inset shows the ratio Tf : Tg as a function of B0.
∑
where r Nij refers to the coordinates of the native state, d(x)
is the Kronecker delta function, and N is the number of
monomers. A susceptibility-like parameter is defined as:
− χs
0
2
The plots of Tf and Tg as a function of B0 corresponding to the
sequence DSKS′. The inset shows the ratio Tf : Tg as a function of B0.
through the structural overlap function, defined by
Camacho and Thirumalai [10] as:
1
χs = 1 − 2
δ (rij − r ijN)
(8)
N − 3N + 2 i ≠ j , j ±1
χ = χ2s
–2
B0
B0
2
(9)
where the angular brackets indicate a thermodynamic
average, i.e. over the Monte Carlo trajectory.
Figure 7 shows the temperature dependence of x together
with that of the specific heat, c (the fluctuation in energy
divided by T2 and by the number of monomers). Both
quantities are calculated by an exact enumeration. As a
signature of a good folder, Klimov and Thirumalai [11,12]
have proposed small values of a parameter s = (Tc – Tx)/Tc,
where Tc is the temperature at which c reaches its
maximum and Tx is a similar temperature for x. Thus, for
good folders, the peaks in c and x ought to almost coincide. The plots of the two quantities in Figure 7 indeed
have the peaks coinciding in the case of sequences R and
DSKS but not for DSKS′. Thus good folders behave like
ferromagnets, in which peaks in the magnetic susceptibility and specific heat coincide. Bad folders, on the other
hand, behave like spin glasses, in which a cusp in the susceptibility lies below a typically broad maximum in the
specific heat (e.g. see [27]).
The cell dynamics
We now proceed to describe our new way to represent the
dynamics of the model protein in a coarse-grained fashion.
The simplest characterization of the dynamics is given by
Tg. This information, however, is too sparse to discuss
mechanisms of the emerging folding funnels. Information
about all of the states that turn up in a Monte Carlo trajectory, on the other hand, is too bulky and too amorphous to
see patterns. A compromise is offered by a coarse-grained
mapping of the Monte Carlo states to a class of important
states that are much fewer in number [19,20]. Since the
native state in the good folding sequences is maximally
compact, the physically relevant choice of the class is that
consisting of the maximally compact cells. In the case of
the bad folder DSKS′, the native cell need not coincide
with the native conformation (it does for B0 < –2.9111),
but it is defined as the maximally compact state that is
closest, in terms of contacts, to the native state (note that
when the native state is one shown on the right-hand side
of Figure 2, the corresponding native cell is not the conformation shown on the left-hand side of this figure).
Figure 8 shows the two-step scheme that we adopted to map
a given conformation to a cell. The conformation on the left
takes on the form in the middle on using a steepest-descent
Research Paper Cell dynamics of folding Cieplak and Banavar
Figure 7
Figure 8
↓Tg
Tf↓
1.5
R
49
c
1.0
22
χ
0.5
11
17
0.0
↓Tg
1.5
c, 10χ
241
↓Tf
22
5
DSKS
c
1.0
19
30
χ
0.5
5
37
7
17
19
33
18
30
29
12
0.0
↓Tf
1.5
30
↓Tg
DSKS'
E=–2.85
1.0
E=–9.18
E=–12.51
χ
c
0.5
0.0
15
0
1
2
3
The mapping of a conformation to a cell as described in the text. The
numbers indicate the ranking of the interaction energies in the
contacts. The larger numbers correspond to contacts that are common
between the quenched and the cell states. The energies indicated are
for sequence R at B0 = –1.
T
The specific heat and fluctuation in the parameter of structural stability
(multiplied by 10) for the three sequences as indicated. The solid lines
refer to x and the dotted lines to c.
quenching procedure to a local minimum. Of the 69 maximally compact conformations, the conformation on the
right is found to have maximum energy overlap in
common contacts with the middle picture and the
mapping to the cell is complete. It should be noted that
the first step of the transformation necessarily leads to an
intermediate state which is lower in energy than the original state. The second step, however, may map to a maximally compact state of energy higher than that of the
intermediate state. Overall, the transformation maps the
Monte Carlo states primarily into lower-energy segments
of the phase space. All states that map to a given cell can
be considered to be excitations (like phonons in a solid)
belonging to the cell but some of these ‘excitations’ might
be of a negative energy. In all of the cases we tested, the
cells were kinetically accessible from their excitations in a
small number of steps.
A train of energies of the conformations appearing in the
dynamics of sequence R at T = 0.8 during a segment of a
Monte Carlo run of 10 000 steps starting from a random
configuration are shown in Figure 9. These energies
display considerable variation and rapid changes. This
figure also shows the corresponding energies in cell space.
The system is mapped to states in which it ‘stays’ for a
considerable amount of time, especially in the native cell.
Figure S4 of the Supplementary material shows a segment
of 10 000 Monte Carlo steps in cell space at two temperatures (0.5 and 0.8), one at Tg and the other between Tg and
Tf, for sequence R. The native cell has less occupancy at
T = 0.5 with fewer dynamic contacts with the other cells
than at T = 0.8 showing the lack of equilibration, a breakdown of ergodicity, and the difficulty of rapid folding at
the lower temperature.
The most striking characteristic of the folding propensity
is captured by the average occupancy of the native cell, C0.
Figure 10 shows C0, as a function of temperature for the
three sequences R, DSKS, and DSKS′ at B0 = –1. The
dotted and broken lines indicate results obtained at a
fixed number of Monte Carlo steps indicated in the figure.
The solid line, denoted by F, corresponds to the data
points collected from only the portion of the run until the
native state was reached. Below Tg, indicated by the arrow,
only the first 2 million steps are taken into account. For
sequence R, there is a broad and elevated plateau in C0.
For DSKS, the region of significant values of C0 is
reduced, and for the bad folder, it is absent. After folding,
242
Folding & Design Vol 2 No 4
Figure 9
R
0
T=0.8
MC
Motion in the conformation space (the top
panel) and in the coarse-grained cell space
(the bottom panel) of a typical 10 000 step
long Monte Carlo trajectory for sequence R at
T = 0.8. The energy levels of the 69 cells are
shown on the right.
–10
–20
E
0
Cell
–10
–20
0
2000
4000
6000
8000
t
the system stays predominantly in the native cell accounting for the (indefinite) sharpening of the curves as the
number of steps considered increases. What is defined
uniquely then and what is thus physical is C0 determined
only from the folding stage of the evolution — curves F. If
one restricts oneself further to a substantial portion of the
folding stage, C0 is changed very little, which indicates the
robustness of the definition.
The cell-to-cell connections for sequence R are shown in
Figure 11 at temperatures 2.0, 0.8, and 0.3 — i.e. above
Tf, between Tg and Tf, and below Tg, respectively. The
data refer to the folding stage of the evolution — the
runs proceeded until folding — and are based on 200
starting configurations. The columns and rows denote
the cell number arranged according to the energy, with 1
corresponding to the native cell. The diagonal entries
show the average occupancy of the cells during the runs.
These entries, when summed over all 69 cells, are normalized to 1000. For the off-diagonal terms, only successful transitions to a different cell are considered.
They show the interlinking between different cells. The
sum of all off-diagonal entries is normalized to 1000 and
the most significant entries, those bigger than or equal to
10, are displayed. Only those diagonal entries are shown
that are related to substantial off-diagonal matrix elements. The matrix is found to be approximately symmetric (when further digits of the entries, than shown in
Figure 11, are considered).
The situation corresponding to the best folding is illustrated by the matrix calculated for T = 0.8 — the lowenergy cells are directly connected to the native cell and
the corresponding transition rates are high. The pattern of
direct connectedness to the native cell is also visible in a
bigger matrix corresponding to T = 0.8 when the off-diagonal cutoff is reduced to 5, as shown in Figure 12.
At T = 2.0, the most significant transitions are from higher
lying cells directly to the native cell, but most of the other
cells are also present in the dynamics and the native state
has low occupancy. Below Tg, on the other hand, the pathways to folding are primarily through intermediate stages,
starting from cells of high energy; in addition, there is a
significant weight in transitions between cells that do not
have any substantial links to the native cell.
Research Paper Cell dynamics of folding Cieplak and Banavar
Figure 10
Figure 11
R
107
1.0
R
1
7
T=2.0
F
0.5
↑Tg
↑Tf
13 50
14
16
19
21
23
32
30
36 18
0.0
107
1.0
7 14 19 36
1 292 13 15 21 18
105
DSKS
1 2 3 4 6 7 8 17 20
1 678 21 24 20 29 13 60 22 10
105
C0
243
0.5
2 21 57
F
3 24
↑Tg ↑Tf
0.0
T=0.8
↓Tf
0.5
24
4 20 24
DSKS'
1.0
24
27
6 29
↓Tg
7 13
F
8 60
10
13
20
17 22
105
0.0
0
8
20 10
1
2
3
3
T
1 3 7 10 19 20 23 29 43
1 544
34
20
Occupancy of the native cell as a function of T, as explained in the text.
The results for 100 000 steps and for the folding stage (F) are based
on 200 starting conformations. For the case with 10 million steps, we
used one starting conformation. The values of Tg and Tf are indicated
by the arrows.
3
16
7 34
68
54
10
This pattern of behavior is generic — it is also observed
for the DSKS sequence, with less direct connectivity to
the native cell. This is shown in Figure S5 of the Supplementary material. The bad folder of Figure 13, on the
other hand, is characterized by a predominance of the
hierarchical steps and/or transitions between cells of high
energy at any temperature. The cell-to-cell connectivity is
observed to depend primarily on the value of the temperature relative to Tg and Tf. Our description based on 69 cells
suggests that, at least in the two-dimensional systems, the
pathways to folding are through a direct linking of a significant portion of viable cells to the native cell. This cluster
of well-connected cells can be considered to be a funnel
for folding. Similar patterns of behavior are expected to be
found in three-dimensional systems, for which the set of
relevant cell states needs to be determined in an approximate fashion.
A related study to ours on folding pathways in a threedimensional lattice model has been carried out by
T=0.3
57
53
19
2
20 20
23
29
43
11 74
57
68
28
3
11
14 10
74
10 18
28
8
Cell-to-cell connectivity of sequence R at three temperatures as
described in the text. The cutoff value for the off-diagonal entries is 10.
Leopold et al. [4]. They considered two sequences, one of
which was a good folder and the other not, and explored
the mean first passage time for diffusion between maximally compact states. Their principal finding was that the
good folder had a folding funnel through which kinetic
access to the native state was facilitated. In contrast, in
our two-dimensional study, we map each conformation of
244
Folding & Design Vol 2 No 4
Figure 12
R
1
2
3
4
5
6
7
8
9
T=0.8 11
14
16
17
19
20
23
29
35
36
37
1 2 3 4 5 6 7 8 9 11 14 16 17 19 20 23 29 35 36 37
678 21 24 20 6 29 13 60
8 8 8 22 7 10 8
8 6 3
21 57
24
24
24
27
20 24
8
6
29
10
7
13
13
20
60
7
9
3
8
8
5
8
11
8
7
22
6
7
3
10
13 8
8
8 10
5
8
3
6
6
3
7
Figure 13
1
DSKS'
5
1 206 15
T=2.0
5
15 54
8
17
13
24
22
12
23
10
40
10
1
T=0.8
40
16 24
12 10
10
36
37
41
18
4
5
6
7
17
17
9
13
11
16
24
58 44
6
13
43 28
7
the heteropolymer to one of the maximally compact cells
and study the dynamics of evolution in this coarsegrained cell space.
22 23
66
4
5
13
63
1 274
Cell-to-cell connectivity of sequence R at T = 0.8 with the cutoff value
for the off-diagonal entries equal to 5.
8
24
9
11
13
16
22
13
95
14
14 30
Conclusions
Our studies suggest that champion folders have a large
Tf : Tg ratio, the cell motion is qualitatively different at different temperatures and for different sequences, the
folding pathways in cell space involve substantial direct
links to the native cell, and the parameter σ is small for
good folders due to a dominance of the native cell in the
averages of the structural stability. The champion folders
are obtained by rank ordering of the contact energies and
by assigning the most attractive couplings to the contacts
present in a target protein. The extent to which biologically relevant proteins actually follow the rank ordering
prescription remains to be elucidated. Nevertheless, rankordered sequences form end members of the class of
rapidly folding model proteins. The cell mapping strategy
proposed here can be viewed as an extension of the idea
of Leopold et al. [4]. In our approach, a folding funnel is
identified by the existence of significantly connected
paths leading to the native cell.
1
2
4
1 117
T=0.3
34 12
4
12 16
25
6
9
6
25
2
5
5
13
14
85 68
13
21
17
96
12
14
187
40
14
21
11 13 14
86 85
11
13
9
187
17
24
11
Acknowledgements
We are grateful to Saraswathi Vishveshwara for collaboration and discussions. This work was supported by Komitet Badan Naukowych (Poland;
Grant Number 2P302 127), National Aeronautics and Space Administration, National Science Foundation, the Petroleum Research Fund administered by the American Chemical Society, and the Center for Academic
Computing at Penn State.
Cell-to-cell connectivity of sequence DSKS′ at three temperatures
as described in the text. The cutoff value for the off-diagonal entries
is 10.
Research Paper Cell dynamics of folding Cieplak and Banavar
245
References
1. Anfinsen, C.B., Haber, E., Sela, M. & White, F.H. (1961). The kinetics
of formation of native ribonuclease during oxidation of the reduced
polypeptide chain. Proc. Natl Acad. Sci. USA 47, 1309–1314.
2. Bryngelson, J.D., Onuchic, J.N., Socci, N.D. & Wolynes, P.G. (1995).
Funnels, pathways and the energy landscape of protein folding: a synthesis. Proteins 21, 167–195.
3. Levinthal, C. (1968). Are there pathways for protein folding? J. Chem.
Phys. 65, 44–45.
4. Leopold, P.E., Montal, M. & Onuchic, J.N. (1992). Protein folding
funnels: a kinetic approach to the sequence–structure relationship.
Proc. Natl Acad. Sci. USA 89, 8721–8725.
5. Onuchic, J.N., Wolynes, P.G., Luthey-Schulten, Z. & Socci, N.D.
(1995). Toward an outline of the topography of a realistic protein
folding funnel. Proc. Natl Acad. Sci. USA 92, 3626–3630.
6. Bryngelson, J.D. & Wolynes, P.G. (1989). Intermediates and barrier
crossing in a random energy model (with applications to protein
folding). J. Phys. Chem. 93, 6902–6915.
7. Bryngelson, J.D. & Wolynes, P.G. (1987). Spin glasses and the statistical mechanics of protein folding. Proc. Natl Acad. Sci. USA 84,
7524–7528.
8. Šali, A., Shakhnovich, E. & Karplus, M. (1994). How does a protein
fold? Nature 369, 248–251.
9. Karplus, M., Šali, A. & Shakhnovich, E. (1995). Kinetics of proteinfolding — reply. Nature 373, 665–665.
10. Camacho, C.J. & Thirumalai, D. (1993). Kinetics and thermodynamics
of folding in model proteins. Proc. Natl. Acad. Sci. USA 90,
6369–6372.
11. Klimov, D.K. & Thirumalai, D. (1996). Criterion that determines the foldability of proteins. Phys. Rev. Lett. 76, 4070–4073.
12. Veitshans, T., Klimov, D. & Thirumalai D. (1996). Protein folding kinetics: timescales, pathways and energy landscapes in terms of
sequence-dependent properties. Fold. Des. 2, 1–22.
13. Socci, N.D. & Onuchic, J.N. (1994). Folding kinetics of proteinlike heteropolymers. J. Chem. Phys. 101, 1519–1528.
14. Shrivastava, I., Vishveshwara, S., Cieplak, M., Maritan, A. & Banavar, J.
R. (1995). Lattice model for rapidly folding protein-like heteropolymers.
Proc. Natl Acad. Sci. USA 92, 9206–9209.
15 Cieplak, M., Vishveshwara, S. & Banavar, J.R. (1996). Cell dynamics of
model proteins. Phys. Rev. Lett. 77, 3681–3684.
16. Go, N. & Abe, H. (1981). Non-interacting local-structure model of
folding and unfolding transition in globular proteins. II Application to
two-dimensional lattice proteins. Biopolymers 20, 1013–1031.
17. Lau, K.F. & Dill, K.A. (1989). A lattice statistical mechanics model of
the conformational and sequence spaces of proteins. Macromolecules
22, 3986–3997.
18. Dinner, A., Šali, A., Karplus, M. & Shakhnovich, E. (1994). Phase
diagram of a model protein-derived by exhaustive enumeration of the
conformations. J. Chem. Phys. 101, 1444–1451.
19. Stillinger, F.H. & Weber, T.A. (1983). Dynamics of structural transitions
in liquids. Phys. Rev. A 28, 2408–2416.
20. Stillinger, F.H. & Weber, T.A. (1984). Packing structures and transitions in liquids and solids. Science 225, 983–989.
21. Cieplak, M. & Jaeckle, J. (1987). Hidden valley structure of Ising spin
glasses. Z. Phys. 66, 325–332.
22. Dill, K.A., et al., & Chan, H.S. (1995). Principles of protein folding—a
perspective from simple exact models. Protein Sci. 4, 561–602.
23. Chan, H.S. & Dill, K.A. (1993). The protein folding problem. Phys.
Today 46, 24–32.
24. Kauzmann, W. (1959). Some factors in the interpretation of protein
denaturation. Adv. Protein Chem. 14, 1–63.
25. Miyazawa, S. & Jernigan, R.L. (1985). Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18, 534–552.
26. Chan, H.S. & Dill, K.A. (1994). Transition states and folding dynamics
of proteins and heteropolymers. J. Chem. Phys. 100, 9239–9257.
27. Binder, K. & Young, A.P. (1986). Spin-glasses: experimental facts, theoretical concepts, and open questions. Rev. Mod. Phys. 58, 801–976.
Because Folding & Design operates a ‘Continuous Publication
System’ for Research Papers, this paper has been published
via the internet before being printed. The paper can be
accessed from http://biomednet.com/cbiology/fad.htm — for
further information, see the explanation on the contents pages.