Download A Survey of Recent Work on Evolutionary Approaches to the Protein

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clinical neurochemistry wikipedia , lookup

Peptide synthesis wikipedia , lookup

Metabolism wikipedia , lookup

Signal transduction wikipedia , lookup

Paracrine signalling wikipedia , lookup

Amino acid synthesis wikipedia , lookup

Multi-state modeling of biomolecules wikipedia , lookup

Gene expression wikipedia , lookup

Expression vector wikipedia , lookup

Biosynthesis wikipedia , lookup

G protein–coupled receptor wikipedia , lookup

Magnesium transporter wikipedia , lookup

Ribosomally synthesized and post-translationally modified peptides wikipedia , lookup

Ancestral sequence reconstruction wikipedia , lookup

Genetic code wikipedia , lookup

Point mutation wikipedia , lookup

Structural alignment wikipedia , lookup

Interactome wikipedia , lookup

Protein wikipedia , lookup

Western blot wikipedia , lookup

Protein purification wikipedia , lookup

Biochemistry wikipedia , lookup

Metalloprotein wikipedia , lookup

Protein–protein interaction wikipedia , lookup

Two-hybrid screening wikipedia , lookup

Proteolysis wikipedia , lookup

Transcript
A Survey of Recent Work on Evolutionary Approaches to the Protein Folding
Problem.
Garrison W. Greenwood
Dept. of Computer Science
Western Michigan University
Kalamazoo, MI 49008
Jae-Min Shin
Laboratory of Molecular Biology
National Cancer Institute
Bethesda, MD 20892
Byungkook Lee
Laboratory of Molecular Biology
National Cancer Institute
Bethesda, MD 20892
Gary B. Fogel
Natural Selection, Inc.
3333 North Torrey Pines Court
La Jolla, CA 92037
Abstract- A problem of immense importance in computational biology is the determination of the functional
conformations of protein molecules. With the advent of
faster computers, it is now possible to use rules to search
conformation space for protein structures that have minimal free energy. This paper surveys work done in the
last ve years using evolutionary search algorithms to
nd low energy protein conformations. In particular,
a detailed description is included of some work recently
started at the National Cancer Institute, which uses evolution strategies.
a background many in the evolutionary computation
(EC) community do not share. While this paper is not
intended to be a tutorial, it hopefully presents sucient
information so that those in the EC community can understand the problem, appreciate its diculty, and understand the details of how EAs have been employed.
The survey is, for the most part, restricted to work performed in the last 2-3 years (see [1] for surveys on earlier
work). Readers totally unfamiliar with this topic area
may rst want to review some excellent tutorials available in the general science literature [2, 3]. The second
purpose of this paper is to present preliminary results of
some work being conducted at the Laboratory of Molecular Biology at the National Cancer Institute, Bethesda,
MD.
1 Introduction
Biological organisms contain thousands of dierent types
of proteins. Proteins are responsible for transporting
small molecules (e.g., hemoglobin transports O2 in the
bloodstream), catalyzing biological functions, providing
structure to collagen and skin, regulating hormones, and
many other functions. Each protein is a sequence of
amino acids, bound into linear chains, that adopts a specic folded three-dimensional shape. Each shape provides valuable clues to the protein's function. Indeed,
this information is vital to the design of new drugs capable of combating disease.
Regrettably, ascertaining the shape of a protein is a
dicult, expensive task which explains why few proteins
have been categorized in this regard. Virtual protein
models, created on computers, may provide a cost effective solution to accurate prediction of protein shapes.
Unfortunately, the protein folding problem|trying to
predict the structure of a protein given only the protein's
sequence of amino acids|is a combinatorial optimization problem, which so far has eluded solution because
of the exponential number of potential solutions.
The purpose of this paper is two-fold. Previous papers have surveyed the use of evolutionary algorithms
(EAs) in protein folding problems, but they have been
written primarily from the perspective of biophysicists|
2 Folding a Protein
This section reviews the basic structure of proteins followed by a discussion of how EAs have been used to solve
three protein folding subproblems: minimalist models,
side-chain packing, and docking. A discussion of the efcacy of these approaches will be deferred until Section
3.
2.1 Protein Composition
An amino acid consists of a central carbon atom (denoted by C) which is bonded to an amino group (NH2 )
and a carboxyl group (COOH) and a side-chain (denoted
by R). The sequence of amino groups, C carbons and
carboxyl groups is called the protein backbone. The difference between two amino acids is the structure and
composition of the side-chains. Twenty amino acids are
found in biological systems and 19 of them have the
structure shown in Figure 1.
Proteins dier only in the number of amino acids
linked together and the sequential order in which these
amino acids occur. Amino acids link together when the
amino group
carboxyl group
H
O
H
N
Cα
H
C
O
H
R
Figure 1: General structure of 19 of the 20 isolated amino
acids. (The remaining amino acid proline has bonding
between the side-chain and the amino group.) R denotes
the location of the side-chain, which bonds to a central
carbon atom denoted by C.
carboxyl carbon of one amino acid binds with the amino
nitrogen of the next amino acid. This binding releases a
water molecule and the resulting bond is called a peptide
bond. The joined amino acids are referred to as residues
or peptides. The CO{NH group is planar and, when
combined with the bordering C atoms, forms a peptide
group. The condensation of two amino acids is called a
dipeptide. A third amino acid would condense to form a
tripeptide, and so on. Longer chains are called polypeptide chains. Each polypeptide can fold into a specic
shape. Protein molecules can contain one or more of
these chains.
The primary structure of a protein's polypeptide
chain is the sequence of amino acids. Dierent regions of
this sequence tend to form regular, characteristic shapes
called secondary structures. Three main categories of
secondary structures are the -helix, the -strand or sheet, and the turns or loops which connect the helices
and strands. Some studies suggest that certain residues
will appear more often in helices than in strands, which
implies a correlation between amino acid sequence and
shape. Nevertheless, variations do exist so this correlation is not completely understood. An aggregate of all
of these localized secondary structures forms the tertiary
structure which is of prime interest|a protein's function
is heavily inuenced by its tertiary structure. Tertiary
structures can also be combined as sub-units to form a
larger quaternary structure.
Stability of a protein conformation is inuenced by
a number of factors which can include Van der Waals
interactions, hydrogen bonding, and hydrophobic eects
[4]. The polypeptide chain transforms from a disordered,
nonnative state to an ordered, native state. This process
is referred to as protein folding. The function of a protein
is tied to its structure so being able to quickly specify
a structure from its amino acid sequence is of immense
interest. Unfortunately, nding the nal structure remains ellusive, in part due to the astronomical number
of possible conformations. X-ray crystallography (XC)
and nuclear magnetic resonance (NMR) are invaluable
tools but only a relatively small number of proteins have
been studied in this manner1.
2.2 Minimalist Models
Ab initio methods try to predict the fold without using
structure information from any other protein for comparison. These methods explore an energy hypersurface
(tness landscape) for a minimal energy conformation,
which is believed to correspond to the native state. Unfortunately, the enormous size of the energy hypersurface complicates the search process, which has led some
researchers to use minimalist protein models.
The most primitive models do not consider the primary structure of the protein; residues are merely classied as hydrophobic, which mix poorly with water, or hydrophilic, which attract water molecules. Additionally,
all residues are forced to occupy sites on a 2D square lattice with no more than one residue at any given site. This
restriction means the polypeptide chain must form a selfavoiding walk on the lattice. The protein is folded if, at
each point in the polypeptide chain, the next point may
turn 0 or 90 . Many of the conformations created in
this manner are discarded because they produce undesirable steric clashes. Greenwood [7] has shown that a
far larger number of self-avoiding walks can be rendered
by using a 2D torus rather than a 2D lattice.
The 2D lattice or torus is adequate to investigate the
propensity to form a hydrophobic core, but a 3D cubic
lattice is required to investigate secondary structures.
EAs are normally used with o-lattice models where
residues are not required to occupy xed, equally spaced
sites and the side-chain is modeled only as a single, virtual atom. Bond angles and bond lengths are typically
xed, the and dihedral angles are adjustable, and
normally only the trans (! = 180 ) conformation for
the peptide bond is considered (see Figure 2 for the definition of these angles). A common practice used with
genetic algorithms (GAs) is to encode , pairs as bit
strings, with each pair restricted to values taken from a
small dihedral library [8, 9]. This was the approach taken
by Pedersen and Moult [10], although they also used library data for the side-chain dihedral angles (). Other
approaches include allowing random changes in the dihedral angles of up to 5 [11]. Fitness is often measured
in terms of the r.m.s. deviation from a structure determined by XC or NMR methods. For moderate length
polypeptide chains, a r.m.s. deviation of 2.0 A or less is
generally considered excellent.
1 The Protein Data Bank [5] contains a little over 7,000 proteins
and peptides determined by XC and NMR. In contrast, it is estimated that there are some 80,000 genes in a human being, most of
which code for a distinct protein. In addition, there are a similarly
large number of proteins from other organisms, such as mouse and
bacteria.
2.3 Side-chain Packing
The greatest amount of success in protein folding studies uses the known structure of a homologous protein
(i.e., similar in structure). Indeed, as the number of
known structures increases, the probability of resolving
the conformation of other unsolved proteins will likewise
increase. Wilson et al. [12] lists the three major aspects
to homology-based modeling: (1) amino acid sequence
alignment; (2) generating loop conformations as needed;
and (3) predicting the conformations of the side-chains.
The side-chain packing problem (SCPP) deals only with
the last part of this modeling procedure. Because of the
high density of side-chains in the protein interior, sidechain conformations are determined by packing considerations, which involve interactions of many side-chains
at a time. But homology modeling is not the only reason
this problem is of interest. An ecient solution to this
problem is essential for most ab initio protein structure
prediction procedures. It will also aid in determining
permissible mutation sites for increased protein stability
and for a large scale protein re-design.
Figure 2 shows the torsion angles found in the backbone. The , torsion angles determine the polypeptide chain fold, while the torsion angles determine the
packing of the side-chains. The principle diculty in
solving SCPP is the requirement to search over an extremely large solution space. The combinatorial nature
of this problem makes complete enumeration infeasible.
Usually some simplifying assumptions are made to help
reduce the computational complexity. For example, typically all bond lengths and bond angles are xed so that
the only degrees of freedom are in choosing the torsion
angles. The complexity is further reduced by xing the
, torsion angles|which produces a rigid backbone
conformation|and then concentrating just on predicting the side-chain conformation by choosing the torsion angles. Unfortunately, this subproblem is not necessarily easy to solve. If on average there are r rotamer
states per residue, an n-residue peptide chain has rn
possible conformations. More specically, consider the
side-chain Tyr in Figure 2(b). If each side-chain torsion angle were treated as a rigid rotation divided into
10o steps (so has 360o/10o=36 states), then Tyr has
363=46,656 possible structures! And, this number does
not even consider any other residues in the peptide chain.
Fortunately, things are not quite as bad as they may appear. Studies do indicate that side-chain dihedral angles
tend to cluster around particular values (e.g., see [13]).
Moreover, these studies show a relationship also exists
with the backbone conformation. This has led to the
development of several rotamer libraries.
Despite the extreme importance of SCPP, there has
been only limited interest in this problem from the EC
community|most homology work uses GAs to perform
sequence alignments (e.g., see [14]). This is not to say no
H
H
R
Cα
φ
N
ψ
C
Cα
ω
H
R
O
(a)
H
H
O
χ3
H
C
C
H
C
C
C
C
H
H
χ
2
C
H
χ1
(b)
Cα
Figure 2: Torsion angles for (a) a residue and (b) Tyrosine (Tyr) side-chain. Residues usually adopt the trans
conguration (! = 180 ). The torsion angle 1 indicates rotation about the bond connecting the side-chain
to the C carbon; it is present in all residues except Pro
and Gly. The exact number of torsion angles depends
on the chemical makeup of the side-chain.
work has been done. Desjarlais and Handel [15] used a
GA to search for low energy hydrophobic core sequences
and structures, using a custom rotamer library as input.
Each core position was allocated a set of bits within a binary string and the bit values encoded a specic residue
type and set of torsion angles as specied in the rotamer
library. The input rotamer library for each core position
is thus a list of residue/torsion possibilities for the string
location corresponding to the core position. Reproduction was performed with standard recombination and
mutation operators, but an inversion operator was added
to establish genetic \linkage" between pairs of bits. This
GA was recently used in the de novo design of a small
peptide chain [16].
2.4 Docking
Docking problem solutions predict how two organic
molecules will energetically and physically bind together.
One molecule, called the receptor, contains \pockets"
that form binding sites for the second molecule, which is
called the ligand. Any solution must therefore describe
both the shape of the receptor and the ligand as well as
their anity. An ecient means of solving the docking
problem would be of enormous benet. For example,
replication of the AIDS virus depends on the HIV protease enzyme. If one can nd a small molecule that will
permanently bind to the active site in the HIV protease,
the normal function of that enzyme can be prevented.
Autodock [17] is an existing docking software package, which is designed to investigate dockings between
a macromolecule and a small substrate molecule. Rosin
et al. [18] used a GA in conjunction with the Autodock
software package. Two interesting aspects of their GA
implementation are the use of Cauchy deviates for mutation and the incorporation of Lamarckian learning (local search with replacement) on a small fraction of the
population each generation. Mutation was not used to
conduct local searches, but was reserved for the more
conventional role of replacing lost alleles. Instead, local searchers were conducted using a randomized hillclimber with an adaptive step size [19]. Individuals in
the search space encoded Cartesian coordinates of the
small molecule, a 4-dimensional quaternion2 indicating
global orientation, and a set of torsion angles reecting
exibility in the small molecule. In a direct competition,
the authors were able to show that the GA, augmented
with local search, signicantly outperformed simulated
annealing in all test cases.
Jones et al. [20] describe a GA-based ligand docking
program, which permits a full range of ligand conformations exibility with partial exibility of the protein.
Each chromosome encodes the internal coordinates of
2 This is a vector giving an axis of rotation and the rotation
angle.
both the ligand and the protein active site, and a mapping between hydrogen-bonding sites. Internal coordinates were encoded as bit strings where each byte represented a torsion between 180 in step-sizes of 1.4 .
Integer strings were used to identify possible hydrogen
bonding sites. Fitness was determined by summing (1)
the hydrogen-bonding energy between ligand and protein; (2) the pairwise interaction energy between ligand
and protein atoms; and (3) the ligand steric and torsional
energies.
The GA used an island model where several small
distinct populations were evolved, instead of evolving
one large population. Reproduction operators included
crossover, mutation, and a migration operator to share
genetic material between populations. The authors
claim this island model increases eectiveness|but not
eciency|of the GA. One hundred complexes from the
Brookhaven Protein DataBank [5] were used as test cases
with a 71% success rate in identifying the experimental
binding mode.
Raymer et al. [21] used a GA in an entirely dierent
manner. Water molecules within the binding site will
either be displaced by the ligand or remain bound. In
either case, the energetics of the interaction are aected.
The goal of Raymer et al. was to predict the conserved
or displaced status of water molecules upon ligand binding. A k-nearest-neighbor (knn) classier predicted the
displaced status, and the GA determined optimal feature weight values for the classier. Population members encoded feature weights as binary strings and twopoint crossover was the primary reproduction operator.
Fitness was based on the percent of correct predictions
by the knn classier. It is important to note that the
GA was not involved in the prediction of water molecule
status|its sole responsibility was to train the classier
that was responsible for the prediction.
Docking problems are one area where other types of
evolutionary algorithms have been tried. Gehlhaar et
al. [22] used evolutionary programming (EP) to attempt
docking the inhibitor AG-1343 into HIV-1 protease. No
assumptions were made regarding likely ligand conformations, nor ligand-protein interactions. The ligand was
required to remain within a parallelepiped that included
the active site plus a 2.0 A cushion; an energy penalty
was assessed to each ligand atom outside of this box.
Each member of the population of candidate ligand conformations encoded six rigid body coordinates and the
dihedral angles. AG-1343 has a large number of rotatable bonds, so a population size of 2000 was used.
New members were created by additive Gaussian noise
with a lognormal self-adaptation of the strategy parameters. The tness function was the pairwise potential
summed over all pairs of ligand-protein heavy atoms.
These atoms interact through steric (van der Waals) and
hydrogen-bonding potentials. The same functional form
was used for both potentials, but dierent coecients
were used because a single hydrogen bond should have a
larger weight than a single steric interaction. The authors were able to reproduce the crystal structure to
within 1.5 A in 34 out of 100 runs.
3 Discussion
There have some notable successes in homology modeling in recent years|r.m.s. errors of under 1.0 A can
be achieved over large protein sets. This success has
led some to believe that the combinatorial problem in
SCPP is only minor [23]. Although there may be some
justication to these claims, studies indicate that neglecting combinatorial packing leads to higher r.m.s.
deviations|and higher energy values|that in a small
number of cases produce completely unsatisfactory solutions [24]. It is probably therefore prudent to continue
incorporating packing optimization.
It cannot be emphasized enough how important the
energy (tness) function is to the ecacy of the nal
results. A poorly dened energy hypersurface may be
eciently explored by an evolutionary algorithm, but if
the conformations with the lowest energy are nothing like
the native structure, then it is not clear what one should
infer from the results. For example, one study [25] concluded that a GA search was successful|but none of the
conformations generated were similar to the native structure! Ab initio predictions of this sort only demonstrate
the ecacy of EAs. Nevertheless, one should not conclude that only elaborate tness functions are capable of
leading to accurate predictions [11].
A couple of examples will illustrate the importance of
a good tness function. Rosin et al. [18] point out that
some local minima on their energy hypersurface had better r.m.s. deviations from the crystal structure than did
conformations with lower energy values. Hence, further
optimization would not have produced more accurate
structures. Jones et al. [20] are to be commended on
their thorough analysis of the cases in which their docking program failed. One of the reasons for failure was an
enforced requirement that the ligand must be hydrogen
bonded to the binding site, which was a case not always
observed. But the second reason cited was the tness
function underestimated the hydrophobic contribution
to binding.
It is apparent that GAs are still the predominant
EA used in protein folding studies. This fact seems
somewhat surprising since a number of researchers have
pointed out that crossover|the primary reproduction
mechanism used in GAs|is largely ineective for protein folding studies. Crossover is particularly ineective as the protein structure becomes more compact [10].
Moreover, several researchers comment on the inability
of crossover to recombine the so-called building blocks
that should produce better solutions [26, 27]. Evolution
strategies (ES) and EP place emphasis on mutation as a
reproduction mechanism. Perhaps a detailed look in the
use of these paradigms is long overdue.
4 Recent Work at NIH
The Laboratory of Molecular Biology at the National
Cancer Institute has recently begun to explore the ability of an ES to solve instances of SCPP. Our approach is
to rst develop an ES predictor that can nd a reasonable backbone conformation and then attack SCPP. The
rst version of our ES predictor is running and the results
are quite encouraging. This predictor uses a polypeptide
model that has all backbone atoms represented plus the
C atom from the side-chain3. Energy is measured by
r.m.s. deviation from a known crystal structure. This
tness function has an advantage in testing search algorithms because the function is exact in the sense that
the global energy minimum exactly corresponds to the
native structure.
The genome for our ES predictor is an integer array,
which describes the , torsion angles of each residue.
Search operations are conducted with a ( + t )-ES,
where, at generation t, parents produce t osprings
and parents compete equally with osprings for survival.
However, reproduction is done in a dierent manner from
the conventional ( + )-ES. Each parent may generate
up to 200 osprings, but culling of low t progeny is
done immediately after their genesis. This means the
total number of osprings that can be candidates for
survival varies from parent to parent. Furthermore, k
may not necessarily equal m for m 6= k. It is for this
reason we adopt the notation ( + t ). Conventional
truncation selection chooses the parents for the next
generation.
(; )
Probability
(-65, -42)
0.370
(-87, -3)
0.147
(-123, 139)
0.274
(-70, 138)
0.139
(77, 22)
0.045
(107, -174)
0.025
Table 1: The discrete set of , angles and their probability of occurrence. Only the trans (! = 180 ) conguration is supported.
New conformations are generated by stochastic modication of selected residue torsion angles (see below).
Mutation is now the only reproduction operator because
in earlier tests we found that generally recombination
3 The C atom in the side-chain is directly bonded to the C
atom.
was of little value, particularly as the structures become
more compact. Mutation is performed over a randomly
positioned set of k consecutive residues, typically set to
three. The r.m.s. deviation from the crystal structure|
which is proportional to tness|is computed only over
a window of consecutive residues, which includes the k
residues that are mutated. The window size w is an
adapted strategy parameter that will grow until it equals
the full length of the polypeptide chain.
The rst change is to model just a virtual backbone|
composed only of C atoms|to properly orient the
backbone. The bond angles and torsion angles are virtual in the sense they do not directly correspond to the
, angles in the current model (see Figure 2(a)). A
C chain model is shown in Figure 4.
θi
Cα i-1
γ
Cα
Cα i
i+1
γ
i+1
i
Figure 4: A virtual backbone model.
Figure 3: A ribbon diagram of Streptococcal Protein
G (left) and the ES predicted structure (right). The
single -helix is clearly visible. Arrowheads on the sheets indicate orientation from the N -terminal to the
C -terminal. Figure was generated by MolScript v2.1.1,
Copyright (C) 1997-1998 [29].
In the more frequently encountered trans-peptide conformation, the torsion angles tend to cluster into six distinct regions in the Ramachandran ( , , !) map [28].
Hence, the the ES predictor mutates a torsion angle by
choosing a dierent angle from a discrete set. However,
these angles were not chosen with equal probability but
with a bias derived from a protein library [8]. The discrete set of angles and the probability of selection are
shown in Table 1. Fitness is measured by the negative
of the r.m.s. deviation from the crystal structure with a
penalty term for steric clashes.
The test case for our ( + t ) ES predictor was a
Streptococcal Protein G, which has 61 residues. This
protein contains one -helix and four -strands. Each
generation manipulated a population of size = 200
with t = 200 (maximum). The initial window size was
w = 5. The predictor consistently founds conformations
with an r.m.s deviation around 1.8 A. Figure 3 depicts
the results.
5 Future Work
The current version of our ES predictor, although rudimentary, is capable of producing good results. However,
its current form is not adequate to solve instances of
SCPP and so a number of changes have been planned.
Proper placement of the backbone is crucial to the
accurate prediction of side-chain conformations. It may
appear a C chain model is a step backwards since a
C chain has less detail than the backbone model. Actually, the C chain model is the appropriate level of
detail needed to search for accurate backbone conformations. As the protein becomes more compact, the
likelihood of steric clashes increases|making many conformations produced by a stochastic search invalid. Elofsson et al. [30] showed that local moves in torsion space
improves the eciency of search algorithms. Essentially,
a local move changes torsion angles within a sequential
window while leaving the positions of all C atoms outside of the window unchanged. We intend to analyze and
then incorporate local moves into our ES predictor.
Once the C chain is placed properly, the chain can be
decorated with the peptides. This permits introducing
some exibility into the backbone. A simple method of
introducing exibility only requires changing the peptide
orientation angle (Figure 5), which seems to provide a
good compromise between computational speed and the
production of realistic folded structures [31].
P
Cα i
O
α2
C
Cα i+1
Cα i+3
Cα i+2
N
Q
H
Figure 5: The bonds forming the peptide group are
shown by thick lines. These atoms lie in one plane (P )
while the Ci+1 , Ci+2 and Ci+3 atoms connected by
the heavy dotted line lie in the other plane (Q). The
peptide orientation angle 2 is the angle between the
two planes. The bond lengths are not to scale.
The next anticipated change is full modeling of the
side-chains. The bond angles and bond lengths will remain xed so the only degree of freedom will be the torsion angles. Initially a discrete set of torsion angles
(10 increments) will be used for the , in the virtual
backbone, the peptide orientation angle, and the torsion angles. Eventually this discrete set will be replaced
by continuous values. During each generation the ES
predictor will change the , angles, but prior to computing the energy, the peptide orientation angles and torsion angles will be optimized. Hence the ES will become hierarchical.
Finally, the tness function used in our original ES
predictor|r.m.s. deviation from a crystal structure|
must be replaced before ab initio prediction can be performed. A number of widely available energy functions
do exist and we will investigate which one is most appropriate. A reasonable rst choice would be a statistical
potential [32, 33]. These changes should produce a predictor that is both eective and ecient.
Acknowledgements
The research of GWG was sponsored in part by NSF
Grant ECS-1913449.
Bibliography
[1] D. E. Clark and D. R. Westhead. Evolutionary algorithms in computer-aided molecular design. J.
Comput.-Aided Mol. Des., 10:337{358, 1996.
[2] F. M. Richards. The protein folding problem. Sci.
Amer., 264:54{63, 1991.
[3] H. Chan and K. Dill. The protein folding problem.
Physics Today, 46:24{32, 1993.
[4] N. D. Socci, W. Bialek, and J. N. Onuchic. Properties and origins of protein secondary structure.
Phys. Rev. E, 49:3440{3443, 1994.
[5] E. E. Abola, J. L. Sussman, J. Prilusky, and N. O.
Manning. Protein Data Bank Archives of ThreeDimensional Macromolecular Structures. In: Methods in Enzymology (C. W. Carter Jr. and R. M.
Sweet, eds.), Vol. 277, Academic Press, San Diego,
1997.
[6] A. Robertson and K. Murphy. Protein structure
and the energetics of protein stability. Chem. Rev.,
97:1251{1267, 1997.
[7] G. W. Greenwood. Ecient construction of selfavoiding walks for protein folding simulations on a
torus. J. Chem. Phys., 108:7534{7537, 1998.
[8] M. J. Rooman, J-P. Kocher, and S. J. Wodak. Extracting information on folding from the amino acid
sequence: accurate predictions for protein regions
with preferred conformation in the absence of tertiary interactions. Biochem., 31:10226{10238, 1992.
[9] J. R. Gunn. Sampling protein conformations using
segment libraries and a genetic algorithm. J. Chem.
Phys., 106:4270{4281, 1997.
[10] J. Pedersen and J. Moult. Protein folding simulations with genetic algorithms and a detailed molecular description. J. Mol. Biol., 269:240{259, 1997.
[11] S. Sun, P. D. Thomas, and K. A. Dill. A simple protein folding algorithm using a binary code
and secondary structure constraints. Protein Engr.,
8:769{778, 1995.
[12] C. Wilson, L. Gregoret, and D. Agard. Modeling
side-chain conformation for homologous proteins using an energy-based rotamer search. J. Mol. Biol.,
229:996{1006, 1993.
[13] N. Summers, W. Carlson, and M. Karplus. Analysis
of side-chain orientations in homologous proteins. J.
Mol. Biol., 196:175{198, 1987.
[14] C. Notredame, L. Holm, and D Higgins. COFFEE:
an objective function for multiple sequence alignments. Bioinfo., 14:407{422, 1998.
[15] J. Desjarlais and T. Handel. De novo design of the
hydrophobic cores of proteins. Protein Sci., 4:2006{
2018, 1995.
[16] G. Ghirlanda, J. Lear, A. Lombardi, and W. DeGrado. From synthetic coiled coils to functional
proteins: automated design of a receptor for the
calmodulin-binding domain of calcineurin. J. Mol.
Biol., 281:379{391, 1998.
[17] G. Morris, D. Goodsell, R. Huey, and A. Olson.
Distributed automated docking of exible ligands
to proteins: parallel applications of Autodock 2.4.
J. Comp.-Aid. Mol. Des., 10:293{304, 1996.
[18] C. Rosin, R. Halliday, W. Hart, and R. Belew. A
comparison of global and local search methods in
drug docking. Proc. 7th ICGA, pages 221{228,
1997.
[19] F. Solis and R. Wets. Minimization by random
search techniques. Math. Ops. Res., 6:19{30, 1981.
[20] G. Jones, P. Willett, R. Glen, A. Leach, and R. Taylor. Development and validation of a genetic algorithm for exible docking. J. Mol. Biol., 267:727{
748, 1997.
[21] M. Raymer, P. Sanschagrin, W. Punch,
S. Venkataraman, E. Goodman, and L. Kuhn.
Predicting conserved water-mediated and polar
ligand interactions in proteins using a k-nearestneighbors genetic algorithm. J. Mol. Biol.,
265:445{464, 1997.
[22] D. Gehlhaar, G. Verkhivker, P. Rejto, C. Sherman,
D. B. Fogel, L. J. Fogel, and S. Freer. Molecular recognition of the inhibitor AG-1343 by HIV-1
protease: conformationally exible docking by evolutionary programming. Chem. Biol., 2:317{324,
1995.
[23] F. Eisenmenger, P. Argos, and R. Abagyan. A
method to congure protein side-chains from the
main-chain trace in homology modeling. J. Mol.
Biol., 231:849{860, 1993.
[24] M. Vasquez. An evaluation of discrete and continuum search techniques for conformational analysis
of side chains in proteins. Biopolymers, 36:53{70,
1995.
[25] S. Schulze-Kremer and U. Tiedemann. Parameterizing genetic algorithms for protein folding simulation. Proc. 27th Ann. Hawaii Int'l Conf. on Sys.
Sci., pages 345{354, 1994.
[26] A. H. Kampen and L. M. Buydens. The ineectiveness of recombination in a genetic algorithm for the
structure elucidation of a heptapeptide in torsion
angle space: a comparison to simulated annealing.
Chem. & Intell. Lab. Sys., 36:141{152, 1997.
[27] N. Krasnogor, D. Pelta, P. E. M. Lopez, and E. de la
Canal. Genetic algorithm for the protein folding
problem, a critical view. Proc. Engr. of Intell. Sys.
98, pages 345{352, 1998.
[28] G. Ramachandran and V. Sasisekharan. Conformation of polypeptides and proteins. Adv. Protein
Chem., 23:283{437, 1968.
[29] P. J. Kraulis. Molscript|a program to produce
both detailed and schematic plots of protein structures. J. Appl. Cryst., 24:946{950, 1991.
[30] A. Elofsson, S. M. Le Grand, and D. Eisenberg.
Local moves: an ecient algorithm for simulation
of protein folding. Proteins, 23:73{82, 1995.
[31] Y. Wang, H. Huq, X. de la Cruz, and B. Lee. A new
procedure for constructing peptides into a given C
chain. Folding & Design, 3:1{10, 1997.
[32] S. Miyazawa and R. Jernigan. Estimation of effective interresidue contact energies from protein
crystal structures: quasi-chemical approximation.
Macromolecules, 18:534{552, 1985.
[33] M. Sippl. Calculation of conformational ensembles
from potentials of mean force. an approach to the
knowledge-based prediction of local structures in
globular proteins. J. Mol. Biol., 213:859{883, 1990.