Download Physical Models for Protein Folding and Drug Design

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Signal transduction wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Paracrine signalling wikipedia , lookup

Drug design wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

SR protein wikipedia , lookup

Clinical neurochemistry wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Magnesium transporter wikipedia , lookup

Metabolism wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Biosynthesis wikipedia , lookup

Metalloprotein wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Interactome wikipedia , lookup

QPNC-PAGE wikipedia , lookup

Point mutation wikipedia , lookup

Genetic code wikipedia , lookup

Protein purification wikipedia , lookup

Protein wikipedia , lookup

Western blot wikipedia , lookup

Biochemistry wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
Proc. Idea-Finding Symposium
Frankfurt Institute for Advanced Studies (2003) 23–33
Idea-Finding Symposium
Frankfurt Institute for
Advanced Studies
Frankfurt, Germany
April 15–17, 2003
Physical Models for Protein Folding and Drug Design
R.A. Broglia1,2 and G. Tiana1
1
2
Department of Physics, University of Milano and INFN, Sez. di Milano
Via Celoria 16, 20133 Milano, Italy
The Niels Bohr Institute, Bledgamsvej 16, 2100 Copenhagen, Denmark
Abstract. The problem of protein folding consists in understanding how the
aminoacid sequence of a protein (primary structure) determines its unique, biological
active equilibrium conformation (tertiary structure). By mean of simplified models,
we explore the dynamical processes which are at the basis of the folding of model
proteins and find a simple hierarchical mechanism which governs the folding phenomenon. Exploiting this result, it is possible not only to develop an algorithm to
determine the equilibrium conformation of a protein from its sequence, that is to solve
the protein folding problem provided one knows the interaction among the amino
acids, but also to design a novel class of drugs which interfere with the folding mechanism and whose inhibitor effect cannot be neutralized through mutations, as it is the
case with standard drugs acting, as a rule, on the active site of enzymes.
1. Introduction
The problem of protein folding is to understand how a protein molecule of specified amino
acid sequence ends up in a unique configuration which, among other things, determines
its biological function [1]. In physical terms, the problem is how the one-dimensional
information provided by the sequence of twenty types of amino acids encodes for a unique
and stable three-dimensional equilibrium conformation.
This problem has a self-evident biological and medical importance. The sequencing of
the human genome [2, 3], that is the identification of the way the thousands of millions of
basis follow each other in the human DNA, provides information on the sequence of amino
acids forming each of the tens of thousands of proteins which build our cells and catalyze
the chemical reactions which make them function. The acquisition of sequence data by
DNA sequencing is relatively quick, and vast quantities of data have become available
through international efforts. But the knowledge of the sequence alone is of little help
in understanding the function of the corresponding protein, in manipulating its function
and in designing drugs to act on it. For that, one needs the three dimensional equilibrium
conformation. On the other hand, the acquisition of three-dimensional data is still slow and
is limited to proteins that either crystallize in a suitable form or are sufficiently small and
soluble to be solved by NMR in solution [4]. In fact, while at present data banks contain
information concerning the linear sequence of about 105 proteins, atomic coordinates of
ISBN 963 000 000 0
c 2003 EP Systema, Debrecen
24
R.A. Broglia and G. Tiana
only 104 native structures are available [5]. Algorithms are thus required to translate the
linear information into spatial information.
Once the conformation of a protein is known, one can attempt at designing drugs to
interact with the protein. Most of the targets of pharmaceutical drugs are enzymes, that is
proteins whose task is to catalyze some reaction in the human body. Such drugs usually
inhibit the associated enzyme by capping its active site thus preventing the enzyme to bind
its substrate. For example, matrilysin is an enzyme involved in the degradation of tissues
which takes place as a consequence of arthritis. Some drugs against arthritis are designed
to inhibit matrilysin activity by binding to its enzymatic site [6].
But the protein folding problem is extremely intriguing also from the physical point
of view. A protein is a system which is in a nearly-zero-entropy equilibrium state (usually
referred to as ‘native’ state) for a wide interval of temperatures (ranging from ∼0 to ∼60
Celsius). Such equilibrium state has essentially no symmetries. The interactions within the
protein are noticeably complicated and heterogeneous. Nonetheless, the protein displays
neither slow dynamics, nor the large number of competing low-energy states and kinetic
traps associated with metastable states, typical of ‘frustrated’ systems [7]. The only feature
of frustrated systems which survives in the case of proteins is the difficulty of predicting
the ground state conformation of the system. This prediction is the essence of the protein
folding problem. The understanding of the process which is behind the folding of proteins
is both interesting as a physical problem per se as well as being functional to the prediction
of the native state.
It is important to emphasize that the main goal of the physical approach to the protein
folding problem is not to analyze the behavior of a specific protein, but to understand the
general principles of the folding mechanism of any protein. The first and basic assumption
needed to proceed further is that such a general paradigm does exist. There are indeed some
evidences which support this view. Although proteins are complicated systems and each of
them can be different from the others for its size, shape, and function, all of them display a
number of common features. For example, secondary motives known as β sheets and α helices, hydrophobic cores, etc. The starting point of the physical approach is, consequently,
the search through the (vast) experimental literature concerning the folding of proteins, for
these common features.
It is furthermore sensible to assume that the tens of thousands of known proteins have
evolved from few common ancestors. Hints of this evolution can be found in the conservation patterns of protein sequences displaying similar native structure. These conservation
patterns can be helpful in understanding the folding of related proteins and further testify
to the fact that it is reasonable to assume that there is a single general mechanism controlling folding. If one subscribes to the idea of a single folding pattern for all proteins, or at
least for small monoglobular proteins, then the use of simplified models to describe this
mechanism is not only allowed, but also useful.
During the last twenty years a remarkable development of protein models has taken
place, ranging from simple two-state models of the kind used by chemists to describe chemical reactions, to all-atom models which take advantage of the power achieved by modern
computers which allow to carry out simulations of the folding of proteins over periods of
time which, in spite of being a small fraction of the full folding time, are still not negligible,
at least for the case of small proteins.
Physical Models for Protein Folding and Drug Design
25
A particularly interesting model describing the protein as a chain of beads on a cubic
lattice, seems to represent an appropriate balance between solvability and realism (cf. e.g.
Ref. [8] and references therein). Studying in detail this model, one can find some remarkable simplicities in the folding of protein-like chains. The folding process is controlled,
within the framework of this model, by a small subset of the amino acids of the protein. As
we shall see in more detail in the next section, these ‘hot’ amino acids [8] build very early
in the folding process few local elementary structures (LES), which diffuse as essentially
rigid entities. When the local elementary structures, which display a high affinity for each
other, find their correct partners, they build the folding nucleus (FN), the minimum set of
native contacts needed to overcome the main barrier of the free energy associated with the
entire folding process [9, 10].
The point of view of folding in terms of assembly of local elementary structures into
the folding nucleus not only accounts for known experimental facts, but also opens the
way to predictions and manipulations. In fact, while the direct prediction of the native
conformation of a protein from the amino acid sequence is difficult. On the other hand,
the localization of the local elementary structures is much easier, elementary structures
are known, it is not impossible to determine the folding nucleus, and from it the native
conformation. Furthermore, the knowledge of local elementary structures can be used to
design drugs able to inhibit the folding, and consequently the biological activity, of selected
proteins [11].
2. The Model
An important ingredient which is at the basis of the folding of proteins is the heterogeneity
of the interaction arising from the presence of twenty kinds of different amino acids. It is
known that physical systems displaying such an heterogeneity display, as a rule, a rough
energy landscape with many competing low-energy states [7]. This is a picture incompatible with that of proteins, which must display a unique ground state, well separated from
the others, and as few metastable states as possible. Consequently, the purpose of these
models is to understand what makes a protein, characterized by a well defined amino acid
sequence, different from a generic heterogeneous system, whose paradigm is found in a
random sequence of amino acids.
The simplest choice for a heterogeneous potential is that of a contact potential, in the
form
X
Bσ (i) σ ( j ) 1(ri − r j ) ,
(1)
U ({ri }, {σ (i )}) =
ij
where ri and σ (i ) are the position and kind of the i th amino acid, 1(r i − r j ) is a contact
function which assumes the value 1 if |r i −r j | ≤ 1 and zero otherwise, while Bσ τ is the element of the 20 × 20 interaction matrix which defines the interaction energy between amino
acids of kind σ and τ . A widely used interaction matrix has been calculated by Miyazawa
and Jernigan (MJ) [12] from the statistical analysis of the contacts of a large database of
known proteins, assuming that the more often a given contact appears in the database, the
more attractive it is. This is done by calculating the probability pσ τ of appearance of the
contact between the amino acids of kind σ and τ , and assuming a Boltzmann-like relationship of the kind Bσ τ ∼ − log pσ τ .
26
R.A. Broglia and G. Tiana
The second approximation used, consisting in locating the beads representing the
amino acids on the vertices of a cubic lattice of unitary side length, implies that the conformational degrees of freedom are discrete. This is very convenient from a computational
point of view and makes conformational entropy easy to handle. Making use of this approximation, the small scale motion of the protein (i.e. the peptide bond vibrations) is
neglected and the chain is constrained to have unrealistic angles between monomers (π/2,
π and 3π/2). A more realistic choice could have been to use a fcc lattice (the average
mean square of the difference between real proteins and their projection onto a fcc lattice
is ∼1 Å [13]), although calculations are slightly more complicated. Since the choice of the
lattice does not change the underlying physics, in the following we will restrict to the use
of a cubic lattice.
Our starting point is the inverse folding approach, which turns the folding problem
upside down, asking which are the sequences folding to a given native conformation. The
answer to this problem is well known, at least within the framework of simple (lattice) protein models. Good folders are obtained by minimizing the energy of the chain in the native
conformation with respect to amino acid sequence for fixed composition. Starting from a
random sequence, composition conserving mutations are introduced (swapping of amino
acids). Within the framework of a Monte Carlo treatment, a sequence with sufficiently low
energy is searched.
Fig. 1. The model description of the
native structure of a protein. In dark
grey and light grey are displayed the
‘hot’ and ‘warm’ sites, respectively.
The dashed lines indicate the contact
building LES.
Good-folder sequences are characterized by a large gap δ = E c − E n (compared to
the standard deviation σ of the contact energies) between the energy of the sequence in
the native conformation E n , and the lowest energy (threshold energy) of the conformations
structurally dissimilar to the native conformation [14,15]. The quantity E c is the lowest energy a random sequence can achieve in the process of compacting, and is a quantity which
is solely determined by the composition of the protein. In other words, good folders are
Physical Models for Protein Folding and Drug Design
27
associated with an normalized gap ξ = δ/σ 1, quantity closely related to the z-score
[16]. Furthermore, starting from a designed sequence which displays a large gap, all mutated sequences which preserve (to some extent) the gap fold into the native conformation
[17]. For the sake of definiteness, we will consider in the following a particular sequence
made out of 36 amino acids called S36 and folding to the native structure shown in Fig. 1,
which can be seen as prototype of folding model sequence [8].
3. Folding of Small Proteins
A striking result which emerges from studying the inverse folding approach is that the
stabilization energy of a protein is note distributed evenly across its amino acid, but is concentrated in few ‘hot’ residues [8]. Locating ‘hot’ amino acids is quite simple. In fact, for
this purpose one introduces point mutations in each site of the native structure, that is, one
replaces each of the amino acids of the designed (low energy) sequence by all of the possible 19 amino acids and study whether the resulting sequence still folds or not. It is found
that mutations in only few sites denaturate (i.e. impedes its folding) as well as destabilizes
(strongly reduces the native state occupation probability) the protein. To be quantitative,
we find that only 8% ± 2% (Fig. 1) of the amino acids of a designed sequence are highly
conserved, strongly interacting and occupy a hot site in the native conformation, in general well protected inside the protein, as it will suit an hydrophobic residue. Mutations of
the amino acids occupying the hot sites denaturate the protein, that is block the unfolded
(denaturated) → native (D → N) phase transition. Mutations of amino acids occupying
the other sites have little effect on the ability the resulting sequence has to fold onto the
native conformation, but lead to sequences which, in the native conformation, still display
an energy lower than E c , thus qualifying as good folders. The resulting families of (homologous) proteins (folding to the same native structures) display in common essentially only
the few amino acids which occupy the hot sites.
The hot amino acids not only determine the stability of the protein but also the hierarchy of native contacts formation through which the protein, starting from an elongated
phase reaches the native conformation (cf. Fig. 2): a) formation, almost instantaneously of
few local elementary structures (LES, i.e. hidden intermediates corresponding to incipient
α helices and β sheets, the secondary structures of proteins) stabilized by the interaction
between the hot amino acids, b) formation of the minimum set of native contacts which
brings the system over the major free energy barrier of the whole folding process resulting from the docking of the LES (i.e. formation of the post-critical folding nucleus (FN)),
c) relaxation of the remaining amino acids onto the native structure shortly after the formation of the FN giving rise to a unique system with an energy below E c [9, 10].
Summing up, the folding of proteins is controlled by the corresponding hot amino
acids through the LES, ultimate building blocks of this molecular LEGO [18]. In other
words, the simple, most important feature common to all designed sequences folding to the
same native structure is the presence of few, highly conserved, strongly interacting, hot,
amino acids which stabilize the LES and which are buried inside the folding nucleus of the
protein in its native conformation.
28
R.A. Broglia and G. Tiana
Fig. 2. Dynamics of contact formation for a MC simulation of the folding of the model
sequence S36 . With a dashed line we label the contacts 3–6, 27–30 and 11–14 stabilizing
the LES S41 , S42 and S43 (cf. Fig. 1). With solid dot lines along the vertical axis we label
(from top to bottom) the contacts: 5–28, 3–30, 14–27, 6–11, 13–28, 6–27, 12–5, 4–29
forming the folding nucleus.
4. Predicting the Native State of a Model Protein
With the help of the results discussed above, we have developed a strategy which allows
to predict the three-dimensional native conformation of a model protein from its amino
acid sequence (three step strategy (3SS) [19]), that is to solve the folding problem provided
the contact energies acting among the amino acids are known. The algorithm consists of
three steps, namely 1) Finding good candidates for the role of local elementary structures,
2) finding the folding nucleus, and 3) finding the native conformation relaxing the residues
not participating in the folding nucleus. This algorithm is based on the hierarchical sequence of events that allows the chain to fold fast and works because at each step only a
limited portion of the configuration space of proteins has to be searched through.
In what follows we discuss in detail the 3SS algorithm and apply it to a representatives
example of notional proteins.
Step 1: Find the local elementary structures (LES) which lead the process of protein folding. Elementary structures can be closed or open, depending whether they contain
interactions within themselves (outside for the peptidic bond), or not. Examples of
closed elementary structures are provided by S41 , S42 and S43 (cf. Fig. 2). In keeping
with this classification of LES, the present step is composed of two substeps.
Physical Models for Protein Folding and Drug Design
29
Substep 1a: Find the open elementary structures. For each substring of the sequence,
starting at monomer i and ending at monomer j (0 < i < j < N), we define the
density of energy
1 X
s =
min Um(l) m(k) ,
(2)
k∈| (i, j )
j −i
i≤l ≤ j
where U is the matrix of contact energies used to design the notional protein, e.g.
the MJ matrix B (cf. Eq. (1)). In other words, s is the average energy with which
each element of the substring (i , j ) interact with the rest of the chain. The substrings
which are good candidates to be open elementary structures in the folding process
have low values of s . Among such substrings we select those with values of s lower
than a threshold s∗ .
Substep 1b: Find the closed elementary substructures. For this purpose we evaluate,
for each pair of monomers i and j , the function
p(i , j ) =
exp(−Um(i) m( j ) /Teff )
,
( j − i )ρ
(3)
where Teff is an effective temperature which we set equal to the standard deviation of
the interaction matrix U (e.g. σ = 0.3 for the case of the contact matrix of Ref. [12]).
The exponential factor ρ = 1.7 reflects the ratio between the number of conformations associated with the formation of a contact and the total number of conformations. If a substructure contains more than one interaction, the values of p associated
with the different interactions are to be multiplied together. As possible (closed) local
elementary structures, we select those composed of mononomers i ,i + 1, . . ., j − 1, j
and with p(i , j ) > p ∗ , where p∗ is a threshold value (see below).
Step 2: Find the folding nucleus. All the elementary structures (let S be the total number of
such structures) found in steps 1a and 1b are moved in space and the conformational
spectrum is found. This is done selecting all possible choices of 1, 2, . . . , S local
elementary structures, giving them all possible relative conformations and making
a complete enumeration of their reciprocal positions in space. The conformations
with lowest energy are selected as possible candidates for the (post-critical) folding
nucleus of the protein.
Step 3: Relax the remaining monomers around the folding core. This can be done through
a complete enumeration of all the conformations displaying a given nucleus, they
are rather few (∼104 for a 36mer). Another way, which we found computationally
attractive is to use a low-temperature Monte Carlo relaxation simulations, keeping
fixed the monomers belonging to the folding core.a
Below we discuss the results of the 3SS strategy when applied to the designed sequence
S36 . In Fig. 3a we display the corresponding distribution of values of p(i , j ) for this sequence. Three bonds have a p value which is remarkably larger than that associated with
the rest of the possible bonds of the protein, and consequently are good candidates for
stabilizing closed local elementary structures. The distribution of values of s , displayed
30
R.A. Broglia and G. Tiana
in Fig. 3b, shows a single peak, whose lowest points are associated with the same sites
already involved in the closed elementary structures. It is thus likely that open elementary structures do not play any noticeable role in the folding process of S 36 . We thus
search for a folding nucleus composed of monomers S41 ≡ (3, 4, 5, 6), S42 ≡ (27, 28, 29, 30)
and S43 ≡ (11, 12, 13, 14), and stabilized by the contacts 3–6, 11–14 and 27–30. A complete
enumeration of all the conformations built out of these three elementary substructures gives
the energy distribution displayed in Fig. 3c. The most stable of these conformation has energy −7.81 and is, in fact, the actual folding core. The relaxation of the other amino acids
around it gives the right native conformation, with energy E n = −16.50. The next lowenergy conformations built out of the three elementary substructures have energy −7.75,
−7.68 and −7.68. The relaxation of the other residues around these tentative folding nuclei lead to ‘native’ energies −12.40, −12.58 and −14.05, respectively. The first two of
them are larger than E c = −14.0, so they correspond to states which belong to the set of
structurally dissimilar conformations to the native conformation we are searching. The last
of them has an energy just below E c . Although it can hardly be confused with the native conformation, it corresponds to a metastable state which can slow down the folding
process.
50
(b)
30
40
n(εs)
n(p(i,j ))
60
(a)
40
20
(3,4)
(3,6)
(6,7)
(27,30)
(29,30)
20
10
36
2730
1114
0
0
2
4
p(i,j )
6
0
2
1.5
1
0.5
εs
0
0.5
1
50
n(E )
(c)
25
0
10
8
6
E
4
2
0
Fig. 3. (a) The distribution of the parameter p(i , j ), whose maximization allows
to find the closed elementary structures.
(b) the distribution of the energy density s , employed to find open elementary structures. (c) The distribution of
the energies associated with the possible
folding nuclei of sequence S36 , build of
the elementary structures 3–4–5–6, 11–
12–13–14 and 27–28–29–30.
Physical Models for Protein Folding and Drug Design
31
5. Drug Design
LES elementary structures are also at the basis of a protocol for non-conventional drug
design recently proposed by us [11]. Conventional drugs perform their activity either by
activating or by inhibiting some target component of the cell. In particular, many inhibitory
drugs bind to an enzyme and deplete its function by preventing the binding of the substrate.
This is done by either capping the active site of the enzyme (competitive inhibition) or,
binding to some other part of the enzyme, by provoking structural changes which make
the enzyme unfit to bind the substrate (allosteric inhibition). The two main features that
inhibitory drugs must display are efficiency and specificity. In fact, it is not sufficient that
the drug binds to the substrate and reduces efficiently its activity. It is also important that it
does not interfere with other cellular processes, binding only to the protein it was designed
for. These features are usually accomplished designing drugs which mimic the molecular properties of the natural substrate. In fact, the pair enzyme/substrate have undergone
millions of years of evolution in order to display the required features. Consequently, the
more similar the drug is to the substrate, the lower is the probability that it interferes with
other cellular processes. Something that this kind of inhibitory drugs are not able to do is
to avoid the development of resistance, a phenomenon which is typically related to viral
protein targets. Under the selective pressure of the drug, the target is often able to either
mutate the amino acids at the active site or at sites controlling its conformation in such a
way that the activity of the enzyme is essentially retained, while the drug is no longer able
to bind to it. An important example of drug resistance is connected with AIDS. In this case,
one of the main target proteins, HIV-protease, a dimer formed out of two identical chains
each containing 99 residues and folding according to the LES paradigm discussed above
(cf. e.g. [20]), is able to mutate its active site so as to avoid the effects of drug action within
a period of time of 6–8 months. In keeping with this result and with the central role played
by LES in the folding process of proteins, we suggest the use of short peptides with the
same sequence as LES (p-LES) as non-conventional drugs which interfere with the folding
mechanism of the target protein, destabilizing it and making it prone to proteolysis. These
drugs are efficient, specific and do not suffer from the upraise of resistance.
In fact, the very reason why LES make single domain proteins fold fast confers p-LES
the required features to act as effective drugs, that is, efficiency and specificity. They are
efficient because they bind as strongly as LES do. Since LES are responsible for the stability of the protein, their stabilization energy must be of the order of several times kT . These
peptides are also as specific as LES are. In fact LES have evolved over millions of years so
as to prevent the upraise of metastable states and to avoid aggregation, aside from securing
that the protein to fold fast. The possibility of developing non-conventional drugs for actual situations is tantamount to being able to determine the LES for a given protein. This
can be done either experimentally (e.g. through molecular engineering [21]), or extending
the algorithm discussed in Ref. [19] making use of a realistic force field. The resulting
peptides can be used either directly as drugs, or as templates to build mimetic molecules,
which eventually do not display side effects connected with digestion or allergies. A feature which makes, in principle, these drugs quite promising as compared to conventional
ones is to be found in the fact that the target protein cannot evolve through mutations to
escape the drug, as happens in particular in the case of viral proteins in response to conven-
32
R.A. Broglia and G. Tiana
tional drugs, because the mutation of residues in the LES would, anyway, lead to protein
denaturation.
Note
a. In some cases the system is non ergodic, in the sense that from a given starting configuration it is not possible to reach all other configurations (with the folding core formed
and fixed). In such cases several relaxation simulations are performed starting from
different conformations (with the folding core formed and fixed). In keeping with this
fact, the folding nucleus of a notional protein could be required not to be exceedingly
stable, so as to avoid long-lived metastable states en route to folding. The (single)
totally relaxed conformation with energy lower than E c is the native conformation of
the protein.
References
1. J. Maddox, Does folding determine protein configuration? Nature 370 (1994) 13.
2. D.D. Shoemaker et al., Experimental annotation of the human genome using
microarray technology, Nature 409 (2001) 922.
3. J.C. Venter et al., The sequence of the human genome, Science 291 (2001) 1304.
4. R.F. Service, Tapping DNA structures produces a trickle, New Focus, Science 298
(2002) 948.
5. Protein Data Bank, http://www.rcsb.org .
6. M.F. Browner, W.W. Smith and A.L. Castelhano, Matrilysin-inhibitor complexes:
common themes among metalloproteases, Biochemistry 23(1995) 6602.
7. M. Mezard, G. Parisi and M. Virasoro, Spin Glasses and Beyond, World Scientific,
New York, 1988.
8. G. Tiana, R.A. Broglia, H.E. Roman, E. Vigezzi and E.I. Shakhnovich, Folding and
misfolding of designed protein-like chains with mutations, J. Chem. Phys. 108
(1998) 757.
9. R.A. Broglia and G. Tiana, Hierarchy of Events in the folding of model proteins,
J. Chem. Phys. 114 (2001) 7267.
10. G. Tiana and R.A. Broglia, Statistical Analysis of Native Contact Formation in the
Folding of Designed Model Proteins, J. Chem. Phys. 114 (2001) 2503.
11. R.A. Broglia, G. Tiana and R. Berera, Resistance proof, folding-inhibitor drugs,
J. Chem. Phys. 118 (2003) 4754.
12. S. Miyazawa and R. Jernigan, Estimation of effective interresidue contact energies
from protein crystal structures, Macromolecules 18 (1985) 534.
13. R.H. Park and M. Levitt, The complexity and accuracy of discrete state models of
protein structure, J. Mol. Biol. 249 (1995) 493.
14. E.I. Shakhnovich, Proteins with selected sequences fold into unique native
conformation, Phys. Rev. Lett. 72 (1994) 3907.
15. E.I. Shakhnovich and A. Gutin, Enumeration of all compact conformations of
copolymers with random sequence of links, J. Chem. Phys. 93 (1989) 5967.
Physical Models for Protein Folding and Drug Design
33
16. V.I. Abkkevich, A.M. Gutin and E.I. Shakhnovich, Specific nucleus as the transition
state for protein folding, Biochemistry 33 (1994) 10026.
17. R.A. Broglia, G. Tiana, H.E. Roman, E. Vigezzi and E. Shakhnovich, Stability of
Designed Proteins against Mutations, Phys. Rev. Lett. 82 (1999) 4727.
18. R.A. Broglia, G. Tiana, S. Pasquali, H.E. Roman, E. Vigezzi, Folding and
Aggregation of Designed Protein Chains, Proc. Natl. Acad. Sci. USA 95 (1998)
12930.
19. R.A. Broglia and G. Tiana, Reading the three-dimensional structure of a protein
from its amino acid sequence, Proteins 45 (2001) 421.
20. G. Tiana and R.A. Broglia, Folding and design of dimeric proteins, Proteins 49
(2002) 82.
21. A. Fersht, Structure and Mechanism in Protein Science, W.H. Freeman and Co., New
York, 1999.