Download From Structure to Function: Methods and Applications

yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Current Protein and Peptide Science, 2005, 6, 171-183
From Structure to Function: Methods and Applications
Haim J. Wolfson1, Maxim Shatsky1, Dina Schneidman-Duhovny1, Oranit Dror1,
Alexandra Shulman-Peleg1, Buyong Ma2 and Ruth Nussinov2,3,*
School of Computer Science, The Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv
69978, Israel, 2Basic Research Program, SAIC - Frederick, Inc., Laboratory of Experimental and Computational Biology, NCI-Frederick, Bldg 469, Rm 151, Frederick, MD 21702, USA, 3Sackler Inst. of Molecular Medicine, Department
of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
Abstract: The rapid increase in experimental data along with recent progress in computational methods has brought modern biology a step closer toward solving one of the most challenging problems: prediction of protein function. Comprehension of protein function at its most basic level requires understanding of molecular interactions. Currently, it is becoming
universally accepted that the scale of the accumulated data for analysis and for prediction necessitate highly efficient computational tools with appropriate application capabilities.
The review presents the up-to-date advances in computational methods for structural pattern discovery and for prediction
of molecular associations. We focus on their applications toward a range of biological problems and highlight the advantages of the combination of these methods and their integration with biological experiments. We provide examples, synergistically merging structural modeling, rigid and flexible structural alignment and detection of conserved structural patterns and docking (rigid and flexible with hinge-bending movements). We hope the review will lead to a broader utilization of computational methods, and their cross-fertilization with experiment.
Keywords: Structural pattern discovery, structural genomics, structure comparison, structural alignment, docking, drug design,
conservation, flexible structural alignment, flexible docking, multiple structural alignment, folding, hinge-bending.
The goal of Computational Biology is to understand living systems through calculations at the molecular level. In
particular, the aim is to predict how molecules interact, to
form the myriad of ways that they are connected and regulated in the living cell. Apart from data management for the
huge amount of biological information, computational biology provides an “experimental tool" for obtaining information from nature. The interplay between “computational experiments” and conventional biological experiments is an
important characteristic of modern biological research. However, a “computational experiment” is an in silico prediction,
which must be followed by experimental validation. Conversely, starting a project with a broad range of biological
experiments may require too many resources. These may be
significantly reduced with the help of computational tools.
Consequently, computational tools are increasingly becoming vital instruments in biological research.
Experimentally determined structures are still unavailable
for most proteins. Yet, due to the Structural Genomics initiative, this situation is changing rapidly. The rapid increase in
the available three-dimensional structural data presents a
major challenge: How to best exploit, extract and in parti*Address correspondence to this author at the NCI-Frederick Bldg 469, rm
151, Frederick, MD 21702, USA; Tel: 301-846-5579; Fax: 301-846-5598;
The publisher or recipient acknowledges right of the U.S. Government to
retain a nonexclusive, royalty-free license in and to any copyright covering
the article.
1389-2037/05 $50.00+.00
cular predict biologically relevant features toward the ultimate goal of predicting protein function, and of putting these
together to produce the biological system.
Obtaining, digesting and applying protein structural information has contributed immensely to our comprehension
of protein structure-function relationship. According to the
Gene Ontology [1] classification protein function is represented by a three level hierarchy: (i) the Molecular Function
(MF) of a protein, (ii) the Biological Process (BP) in which it
participates, and (iii) the Cellular Component (CC) in which
it functions. This review focuses on the first level of this hierarchy, which describes biological activities, such as catalytic and binding activities, at the molecular level.
Protein structure classification assists in detection of
functionally similar proteins that share limited sequence
similarity. Structural alignments are essential for detecting
conserved protein cores, similarities in functional binding
sites, similarities in enzyme mechanisms and evolutionary
conservation. Homology modeling is used to predict an unknown protein structure based on its sequence similarity to
known structures. Protein surface/interface analysis is used to
reveal the structure-property relationship of interacting molecules. Structure-based drug design is applied to protein targets with known active/binding sites. Molecular interactions
are predicted by docking of small ligands, proteins or nucleic
Most applications involve a series of modules, similar to
’unit operations’ in chemical engineering. Sequence and
structural alignments are probably the most widely used ’unit
© 2005 Bentham Science Publishers Ltd.
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
operations’. Some ’unit operation’ modules may rely on others, for example, comparative modeling based on multiple
sequence/structure alignment, or modeling a folded protein
via docking of small protein fragments. Numerous algorithms are available for each ’unit operation’. Yet, no single
program can handle all applications. Fig. 1 provides an overview of typical routine applications that utilize the sequence
and structure analyses. Here, we review state-of-the-art
structural methodologies. Our focus is not on the algorithmic
detail. Rather, our aim is to highlight the strengths and weaknesses of these methodologies, the challenges and the potential solutions for the different types of applications. In the
case studies presented below, we demonstrate how a computational strategy, which synergistically combines the various tools, enhances the biological applications. In addition,
numerous examples from the literature are described in the
last section.
Protein structure alignment has become a standard structural analysis tool [2, 3]. Protein structure similarity may
indicate a functional/evolutionary relationship. Recognition
Wolfson et al.
of a structural core common to a set of protein structures can
serve as a basis for homology modeling [4] and for structural
classification [5]. There are three potential ways to predict
the function of a new protein. The first is to perform sequence comparison with a protein whose function is known.
However, several studies have shown that sequence analysis
alone may lead to unsatisfactory results [6]. Even at 75%
protein sequence identity, structural alignment is more reliable [7]. In addition, Russel et al. [8] presented examples of
similar binding sites found in analogous proteins (proteins
that converge to the same folding motif without having any
common ancestor). The second way to infer the function of a
new protein is by seeking proteins with a similar fold and
with a known function. This can be achieved by performing
structural alignment. Taking into account protein flexibility,
such as domain movements, makes the structural alignment
even more accurate. Recognition of conformational changes
is crucial for understanding biological processes. Yet, in
some cases similarity in function does not necessarily requires similarity of folds. Therefore, the third option to infer
protein function is to detect proteins that have similar binding patterns and thus may perform similar function. The most
Fig. (1). Modern biochemical research combines in vitro experiments, computational tools and large data resources. The diagram illustrates
some possible application flows of various ’unit operations’. For example, given a protein target we may be interested in compounds that can
inhibit it. If the structure of the protein target is unknown, structure modeling is applied first. Next, multiple sequence/structural alignment
can aid in recognizing potential binding sites that should be verified, for example, by mutagenesis experiments. Docking procedures can then
search large compound databases for potential leads. High scoring solutions will be further verified by in vitro experiments. The diagram
presents the following ’unit operations’ (gray boxes in the figure): structure modeling [115], sequence alignment [30], structure alignment,
docking and protein design [116].
From Structure to Function
common example of such patterns is the catalytic triads,
common to proteins with different trypsin and subtilisin
folds. Another example is hundreds of structures with tens of
different folds that can bind ATP. What do these proteins
have in common that make them bind the same ligand?
Structural alignment between the functional sites of these
proteins may assist in answering this question.
2.1 Protein Structural Similarity
There are several ways to represent a protein structure in
3D space. The selected representation determines the level of
resolution at which the structures are compared. Probably the
coarsest representation is by secondary structure elements
(SSE’s). This representation is highly efficient for a fast
comparison of the overall protein folds. The most common
representation is by the backbone atoms (e.g. Cα). This allows the recognition of common structural motifs in proteins
that do not necessarily have the same overall fold or SSEs. A
higher level of resolution is a representation by all atoms or
by functional groups that combine atoms according to the
interaction in which they may participate (for instance, the
aromatic property of a phenyl ring can be represented by its
centroid). Such a representation is advantageous in the comparison of protein functional sites that do not share any sequence or overall fold similarity.
A scoring function that measures structural similarity
should quantitatively discriminate between functionally/evolutionarily related and unrelated proteins. In most
cases, scoring functions measure local or global geometrical
differences between aligned structures, for example the Root
Mean Squared Deviation, (RMSD) of the distances between
aligned Cα atoms. Specific features can also be taken into
account, such as similarity between the aligned amino acids
as well as the quality of the SSE alignment. There are almost
as many scoring functions as the number of alignment methods [2, 3]. However, it is hard to determine which function is
the best, since there is no universally agreed theoretical support to justify any of these. Therefore, a preferable scoring
scheme is likely to be a simple one, for instance, one that
maximizes the alignment size while keeping the distance
between matched atoms less than a specified threshold.
Sequence order dependence/independence of the alignments has biological and computational consequences. For
closely related proteins, a sequential alignment usually leads
to a good structural alignment. Utilizing sequential information in a structural alignment can significantly reduce the
computational complexity. In particular, the dynamic programming algorithm, which is so popular in sequence alignment [9], can be exploited. Preserving the sequence order can
assist in generating structure-based sequence profiles for
homology modeling. However, strictly following the sequence order in structural alignment may be misleading.
There are numerous examples of proteins sharing similar 3D
structures and functions that have different sequences and
even different 3D topological order (e.g. C2 domain-like
[10], four-helix bundle [11]). Therefore, in these cases, sequence order independent methods should be used. These
can be applied to detect conserved 3D motifs in the protein
interior, on protein surfaces or in protein-protein interfaces.
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
The different measures of protein structural similarity can
be used to classify known protein structures. Different protein structure databases have been constructed based on the
master database of the Protein Structure Databank (PDB).
The most popular databases are SCOP, CATH and FSSP.
SCOP [12] defines ten protein structural classes: all-α, all-β,
α+β, α/β, multidomain, membrane and cell-surface proteins
and peptides, small proteins, peptides, designed proteins and
non-protein structures. From the protein classes the SCOP
hierarchy descends to folds, superfamilies, families, domains
and finally to species. The classification is based on visual
inspection/comparison and limited use of sequence similarity, where no methodologically defined structural similarity
score is employed. The CATH [13] database, similar to
SCOP, also contains a hierarchical classification of protein
domains. However, in contrast with SCOP, mostly automated
methods are used to assess a structural and sequence classification of a new protein structure. First, the new protein is
subdivided into well defined domains. Then, each structural
domain is assigned to a topological family and a homologous
superfamily. In total, there are five levels of hierarchical
classification. These range from the coarse secondary structure composition (mainly alpha, mainly beta, alpha-beta) to
sequence families of proteins having very similar 3D structures. FSSP [14] is a fully automatic classification that is
based on exhaustive pairwise structural alignments performed by DALI [5]. The output forms structurally representative fold sets at varying uniqueness levels. Interestingly,
even though the databases use different methods and similarity measures, the classifications in SCOP, CATH and FSSP
largely agree [15]. FSSP scores can even be used to automatically assign the SCOP and CATH classifications to fold
level and topology level, respectively [16].
There are different ways to classify the algorithms for
protein structural alignment, such as pairwise versus multiple
or rigid versus flexible. The problem of rigid pairwise alignment was extensively studied. Many on-line resources are
available for high quality alignments, for example: CE [17],
DALI [5], Geometric Hashing (GH) [18] and VAST [19].
Here, we discuss the more difficult tasks: flexible alignment
and multiple alignment.
2.2 Pair-wise Flexible Alignment
While loops are often variable, the fold is usually conserved in protein families, leading to overall structure and
binding site preservation. Thus, most structural alignment
methods treat protein structures as rigid bodies. However,
protein flexibility cannot be underestimated. Around their
native state, there is an ensemble of conformational isomers
separated by low energy barriers. The movements reflect fast
side-chain movements as well as slow large scale hingebending movements [20, 21]. In hinge-bending movements
molecular parts rotate as relatively rigid bodies. Consider a
pair of proteins with a structurally similar motif. A hingebending movement within the motif will significantly reduce
the rigid structural alignment score between both proteins. A
flexible structural alignment algorithm should nevertheless
detect such a structurally similar motif. Such a situation may
happen due to the conformational changes in protein associations, mutations in the sequence or change in physical
conditions. Hinge-bending movements involve large scale
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
movements critical for function. Efficient detection of hinges
is important for binding site identification and inhibitor/drug
design. If a movement is consistent among a protein family,
most likely it is conserved for function. Assessing the
movements is advantageous in the design of larger drugs to
fill the larger volumes formed during domain opening. Understanding possible conformational changes is useful in
deciphering the activation mechanism and can assist in the
computational modeling of the biological system. Case study
1 presents the modeling of the c-Src and proline-rich peptide
complex, which can only be possible through the recognition
of the conformational changes in the c-Src kinase.
Analogous to rigid structural alignment, flexible structural alignment algorithms have to solve three major tasks.
First, the usually difficult residue correspondence (alignment) task; second, the corresponding region superimposition and, third, hinge detection. Of course, the solutions may
be interrelated. Some methods [22, 23, 24] assume that the
residue alignment is already solved by another technique.
This may happen, for example, when we are given two
structures of the same protein (for example in the complexed
and free states) or when there is a significant sequence similarity between the two proteins, thus enabling application of
a sequence alignment technique to detect residue correspondence.
Yet, the correspondence between the amino acids is frequently unknown. Several algorithms have been developed to
tackle this challenging goal. The methods of Verbitsky et al.
[25] and FlexMol [26] perform flexible residue orderindependent protein alignment with the requirement of prespecification of potential hinges.
To improve the identification of potential hinge positions,
one can apply methods for domain detection and interdomain linkage [27, 28]. In particular, the recently developed
FlexProt algorithm does not require an a priori knowledge of
the flexible hinge regions [29]. It simultaneously detects
hinge regions and aligns the rigid subparts of the molecules.
An advance partitioning of the flexible molecule into rigid
parts is not required. However, since FlexProt exploits the
amino acid sequence order, it cannot detect non-sequential
alignments. There still remains the challenging task of developing a method that automatically detects flexible regions
and at the same time optimizes the sequence orderindependent alignment of the rigid parts.
2.3 Multiple Structure Alignment
Reliable multiple structure alignment is one of the major
challenges in structure analysis. Multiple structural alignment is infinitely superior to pairwise alignment, essentially
for similar reasons that multiple sequence alignment is significantly more informative than pairwise. Conservation of a
region in many molecules bears higher significance as compared to its recurrence in only two. Multiple sequence alignment programs [30], like Psi-BLAST or ClustalW, have become almost everyday tools for computational and even noncomputational biologists. Yet, efficient multiple structure
analysis algorithms have only recently become available.
Multiple structure alignment is a powerful tool for detecting structural motifs common to proteins that share a
similar function. This allows derivation of residue conserva-
Wolfson et al.
tion in analogous positions in 3D space, such as in protein
cores or in protein binding sites. Another application of multiple structural alignment is for inferring evolutionary relations between proteins, like identification of a consensus
(sub)structure that can serve as a model of a potential ancestor [31]. Additionally, methods for multiple structure alignment can aid in structure prediction [32]. For example, homology modeling methods utilize multiple structure alignment to increase the accuracy of the structural model [4].
Most methods obtain the multiple alignment through
global pair-wise comparisons of the input structures [31, 32].
Examples for such methods include STAMP [33], PrISM
[34] and SSAPm [35]. Such methods are potentially time
efficient. However, these methods have an inherent limitation, since in each pairwise comparison the only available
information is with regard to the two molecules which are
involved. Hence, alignments optimal for the whole set may
be overlooked, unless they are optimal for pairs [3]. The general strategy of pairwise-based methods is first to align the
more similar structures and then to proceed to the less similar. Therefore, the main advantage of these methods is their
ability to classify the input proteins into structurally related
sub-families. Furthermore, most methods use dynamic programming and consequently are sequence order-dependent
and cannot detect non-sequential motifs. However, order
dependence can be assumed for applications such as homology modeling.
A more meaningful approach is to consider all given
molecules simultaneously, rather than initiate from pair-wise
comparisons. This approach was adopted by Escalier et al.
[36], MUSTA [37], MultiProt [38] and MASS [39]. Since the
general problem is computationally NP-hard [40], these
methods assume some geometrical or protein structure properties to obtain reasonable running times (MultiProt and
MASS run between several seconds for a few structures to
several minutes for tens of structures). These methods are
non-sequential and thus can detect motifs that are structurally, but not sequentially, conserved. A recent application of
multiple structural alignment [37] to surface analysis has
shown that the conservation of Trp, Phe and Met on the protein surface supplies strong evidence for a binding site [41].
For all three residues, there is a significant conservation in
binding sites while there is no similar conservation on the
rest of the surface. In this article we provide two additional
examples of multiple structure alignment applications. In
case study 3 multiple structural alignment reveals the functionally conserved active site residue of the PLP-dependent
transferase and in case study 2 multiple alignment leads to
the recognition of conserved loops of serine proteases that
include the Ser and His residues of the catalytic triad.
Partial solutions of multiple alignment may have an important biological meaning. Consider for example a set of
100 input molecules, where 40 molecules are structurally
similar taken from family “A”, 50 structurally similar molecules from family “B” and 10 additional molecules that are
structurally dissimilar to any other input molecule. A multiple alignment of all 100 molecules would probably detect a
common structural motif consisting of at most one secondary
structure element. Clearly, it is vital for a multiple alignment
method to be able to recognize two sets, “A” and “B”, from
the 100 structures. Methods that adopt the pair-wise strategy
From Structure to Function
will be the most suitable for such a classification. Furthermore, in some cases there might be only a sub-structure (e.g.
motif or domain) that is similar between some of the input
molecules. For example, a number of proteins from both
families “A” and “B” may share a small common motif.
Therefore, to detect such partial similarities the methods that
consider all molecules simultaneously are more appropriate.
The method of Escalier et al. [36], MultiProt [38] and MASS
[39] allow detection of a number of (different) common
structural motifs and are capable of distinguishing between
similar and dissimilar molecules.
Multiple structure alignment can also be performed in a
randomized manner. Randomized techniques are used extensively for solving complex problems where no optimal methods with feasible running times exist. For example, Monte
Carlo optimization was implemented to solve the multiple
structure alignment task [42].
Most methods analyze structural alignment only at one level:
at the level of single atoms, secondary structure elements, or
larger structural fragments. MolCom [43] provides a multilevel analysis of multiply aligned protein structures. A 3D
map adopts a hierarchical representation of biochemical and
structural properties. Each level has an appropriate scoring
function that allows to quantitatively estimate the protein
alignment at different structural levels (e.g. secondary structures, amino acids and single atoms). Such an approach appears to be suitable for structural analysis due to the hierarchical nature of proteins.
2.4 Recognition of Functional Sites
Recognition of functionally similar proteins with no sequence or fold similarity provides an enormous challenge,
for which most of the previously described methods are not
applicable. Frequently, similar folds imply similar function,
however, proteins with the same fold, like TIM barrels [44],
can have different functions. In contrast, proteins with different folds, like zinc-binding proteins, can share the same
function. Therefore, a more valid assumption is that similar
binding patterns imply similar functions. Identification of
similar functional sites has several applications in drug design: (i) It can provide hints to ligands and ligand fragments
that may be useful in designing new drugs; and (ii) it may
point to proteins that can potentially cause side-effects.
The simplest way to address this problem is by analysis
of structurally diverse proteins that were crystallized in the
presence of the same ligand. The superimposition of these
complexes in a way that aligns their common ligand suggests
the alignment of their binding sites [45, 46]. This allows
comparison and analysis of the spatial arrangements of the
functional groups that are essential for binding. However, the
main drawback of such an approach is the fact that the same
ligand can bind in alternative modes even to the same protein
binding site [47, 46]. Therefore, a more general way to tackle
this problem is by investigating the spatial physico-chemical
patterns exhibited by the amino acids. This approach is potentially more powerful, since it can analyze unbound structures and no ligand information is required. However, it requires solving the more difficult problem of sequence order
independent alignment of certain features in 3D space.
Moreover, the problem is further complicated by the fact that
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
the matched features must also create surface regions with
similar shapes that can accommodate similar ligands.
There are two main approaches to compare protein binding sites. The first type recognizes specific three-dimensional
patterns of amino acids. The input to algorithms of this type
is a description of the specific pattern exhibited by a certain
set of residues, which is searched in a database of protein
structures. Most of these methods were specifically developed for recognition of catalytic residues that form such patterns as ’catalytic triads’. These patterns are typical for some
of the protein families, like serine proteases, triacylglycerol
lipases, ribonucleases and lysozymes. Some of these patterns,
like the ’catalytic triad’ of serine proteases can be present in
proteins with different overall folds (e.g. subtilisin and trypsin folds) indicating the functional similarity of these protein
families. Examples of such methods are TESS [48, 49, 50],
ASSAM [51, 52] and the method presented by Binkowski et
al. [53]. On the other hand, there are many biological examples of totally different proteins that can share the same
binding partners without having any spatial pattern of residues with the same identity [54]. To address this problem,
the second type of methods compares the surface regions that
comprise the binding sites. Methods for recognition of similar binding sites without any assumption regarding the fold
or the identity of the amino acids have been presented by
Rosen et al. [55], Schmitt et al. [56], Kinoshita et al. [57],
Brakoulias et al. [58] and Shulman-Peleg et al. [59]. To
show the usefulness of this approach, the databases of binding sites, CaveBase [56] and eF-site [57], were searched to
recognize similar binding sites shared by totally unrelated
proteins. Additional methods for recognition of functional
sites in protein structures can be found in a recent review by
Jones et al. [60]. Recognition of functional sites by these
methods can be complementary to techniques such as docking and may require the presence of specific (functionally
important) interactions.
As the availability of protein and nucleic-acid structures
grows, algorithms for docking of drugs to receptor molecules
are becoming integral tools for rational drug design. Docking
can be applied in various stages of the drug design process.
The most common application is virtual screening of a compound library for discovery of new leads. This approach has
been applied successfully, leading to a number of novel
ligands [61]. In addition, docking is used in de novo design
methods, where the receptor is screened against a fragment
library [61]. Docking is also applied for a partial prediction
of ADMET (Absorption, Distribution, Metabolism, Excretion and Toxicity) properties. Potential drug candidates can
be screened in silico against proteins active in cellular metabolism, such as Cytochrome P450 enzymes [62]. Screening
against different proteins can aid in predicting potential drug
toxicity. Such computational filtering can save not only
many in vitro tests, but also some of the expensive in vivo
Docking methods are also used for predicting protein–protein interactions. Despite the growth of the PDB, the
number of available protein–protein complexes is relatively
small. Structural modeling of disease related protein–protein
associations may lead to the development of inhibitors to
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
disrupt these associations. Such inhibitor might again be discovered by docking small molecule libraries to each of the
proteins involved in the complex. In addition, docking algorithms are helpful in predicting the 3D structures of large
macromolecular complexes. Current experimental high
resolution structure determination methods, such as X-ray
crystallography and NMR, have difficulties in solving the
structures of large macromolecular associations.
Expectations: Given the structures of two molecules, we
expect from a docking algorithm to predict whether they interact, and if so, what is the 3D structure of their complex
and the affinity of the interaction.
3.1 Methods
As the demand for efficient docking algorithms grows,
many are being developed and improved [63, 64, 65, 66]. All
algorithms include two major stages: 3D search and scoring.
The 3D search stage seeks possible receptor-ligand configurations, while the scoring stage ranks them. Several issues
should be considered in selecting a proper method:
Molecules and Binding Sites
Whereas most docking methods are designed to handle
only one type, either protein-protein docking or protein-drug,
others are applicable to all molecule types. Moreover, some
methods require binding site knowledge, while others can
use it as optional information. Binding site knowledge can
significantly improve the quality of the docking results. Case
studies 2 and 3 demonstrate automatic detection of binding
sites by multiple structural alignment and its employment in
the search stage of docking.
3D Search
The algorithms can be divided into local shape feature
matching, randomized, and brute force enumeration by their
search technique. Dock [67], FlexX [68], PPD [69], QSDock
[70], BUDDA [71], PatchDock [72] and other algorithms
align local shape features, while in ICM [73], AutoDock
[74], Gold [75] and GAPDOCK [76] the search is performed
using randomized and/or evolutionary algorithms. MolFit
[77], ZDOCK [78], 3D-Dock [79], DOT [80], GRAMM
[81], Hex [82] and other protein–protein docking algorithms
use a brute force search for the three rotational parameters
and the FFT (Fast Fourier Transform [83]) for fast enumeration of the translations. The selected search method should be
fast (to allow many docking experiments) and reliable (to
retain the correct result among suggested candidate complexes).
Treating molecular flexibility is one of the major challenges of the docking field. Enumerating over all theoretically possible configurations may result in impractical running times. Several points should be considered in flexible
docking. First, which type of flexibility is included: small
molecule torsional movements, fast side-chain movements,
slow hinge-bending domain movements or loop movements.
Second, in which molecule is the flexibility allowed: in the
receptor, in the ligand, or in both. Third, how is the flexibility treated: is it treated explicitly by exploring the space of all
possible conformations or just implicitly by allowing some
Wolfson et al.
inter-penetrations between molecules (soft docking). Finally,
what is the algorithmic design: some algorithms incorporate
flexibility directly in the 3D search stage whereas others in
the final improvement procedure.
Many algorithms treat the problem of ligand torsional
flexibility. However, only few methods treat the proteinreceptor backbone or side-chain flexibility due to the problem complexity. There are several ways to handle side-chain
flexibility. Randomized and evolutionary algorithms add
torsion angles of side chains as parameters for optimization
[84, 85, 86]. The FlexE algorithm [87] employs MPS (Multiple Protein Structures). The PDB contains numerous redundant structures of the same proteins or multiple conformations from NMR. Thus, rather than using one structure for
docking, an ensemble of aligned molecules can be used.
Molecular dynamics is another source of multiple conformations. These methods sample side-chain instances from different input protein structures and generate new protein conformations. Side-chain conformations are further obtained
from rotamer libraries. In addition, the docking algorithm
should account for the intra-molecular interactions of each
protein conformation in the combinatorial rotamer shuffling.
Hinge bending movements are observed in many proteins
[20]. This type of large scale flexibility is especially difficult
for modeling by docking algorithms, since the movements
include domains, secondary structure elements and loops.
The only available method based on an Articulated Object Recognition algorithm from Computer Vision [88] can
handle hinge-bent flexibility in one of the docked molecules.
Movements are allowed either in the ligand or in the larger
receptor. The method efficiently exploits hinge information
in the search stage, exploring the space of all possible conformations while avoiding brute force search. The flexible
molecule is divided into rigid parts at hinge points and all
parts are docked simultaneously. Hence, the algorithm mimics the so-called “induced” molecular fit. Modeling of the
complex of c-Src kinase with proline rich peptide by hingebased docking is demonstrated in case study 1.
Scoring Function
The output of the conformational search stage is a large
number of potential docked complexes. The scoring stage is
necessary in order to filter out solutions with unlikely interactions and to rank the remaining ones according to some
favorable interaction criteria. Development of a reliable
scoring function is one of the most challenging problems in
the field [65]. Many scoring schemes have been suggested.
These are based on optimization of van der Waals interactions, hydrogen bonding, electrostatic potential, hydrophobicity, atom-atom and residue-residue knowledge-based potentials. However, native-like complexes do not necessarily
have the best shape complementarity, the largest number of
hydrogen bonds, or the most favorable electrostatics. A combination of scoring functions or consensus scoring is often
useful since it can reduce the number of false positive results
3.2 Reality
Despite the recent progress in the field, docking algorithms still require improvements in both scoring function
From Structure to Function
and flexibility treatment. In the leading algorithms the “correct” near native result is found among few hundreds of top
scoring results in most cases, unless large conformational
changes are involved. The source of the problem is in the
scoring stage, due to many false positive results. Nevertheless, docking is widely used in the search for novel drug
leads. Shoichet et al. [90] have compared virtual ligand
screening (VLS) using docking with high-throughput
screening (HTS). The hit rate of docking (34.8%) was 1700fold higher than that of a random screening (0.021%). There
are several advantages to docking over high throughput
screening: (i) the synthesis of the compounds is not required
and the size of the compound library is not limited; (ii)
docking can easily identify large groups of false negative
compounds: those that are too large for the receptor binding
site and have totally unsuitable biochemical properties; (iii)
docking can be further improved by pharmacophoric information [91, 92].
Protein-protein docking is mostly used for predicting the
structure of a complex, when we already know that the molecules of interest interact. Significant progress was made in
protein docking methods in the past few years. CAPRI
(Critical Assessment of Predicted Interactions, http://capri.
ebi. is a recent effort of the docking community to
evaluate the performance of different protein docking methods by blind prediction. Launched in 2001, CAPRI already
had 5 rounds with the total number of 16 targets and 29
groups participating in the last round. This type of assessment is especially important for evaluating state of-the-art
docking algorithms. However, such an assessment does not
only test the algorithms, but also the ability of the predictor
to use correctly any available biological information about
the targets in the docking process. Binding site knowledge
proved to be very helpful in the selection of candidate results. One of the lessons from the CAPRI experiment is that
the success in the prediction depends on the type of the complex [93]. Complexes with large interface area and compact
interface packing are easier to predict than with complexes
with small interface area and relatively flat interface shape.
Additionally, the prediction task is very difficult if there are
conformational changes upon complex formation. Nevertheless, the CAPRI results are encouraging: there were correct
predictions from several groups for most of the targets. The
lessons learned from such experiments show that docking
algorithms still need improvements in terms of flexibility,
scoring functions and automated binding site prediction.
Applications of computational tools to biological problems face a challenge: how to guide the experimental research and how to reduce the number of ’blind’ trials. Integration of individual ’unit operations’ is essential. Sequence
and structural alignments, database applications, homology
modeling, docking of a protein to small ligands or to proteins
can be synergistically combined. The case studies in this article demonstrate examples of such combinations. Other published results also indicate this trend. For example, a combination of different approaches was employed to model a suppressor of cytokine signalling 1 (SOCS-1) [94]. Specifically,
multiple sequence alignment was performed with ClustalW,
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
fold recognition was obtained with GenThreader and
Threader, TopLign, 3D-PSSM and UCLA-DOE and structural alignment was performed with DALI, VAST and CE.
The structure of the SOCS-1 was generated using
MODELLER and docking was run with 3D_DOCK. Another
example of structure-based combinatorial protein engineering, is a design of a subdomain from two distantly related
proteins [95].
4.1 Alignment and Docking
Below we present a set of case studies that illustrate the
biological significance of the combination between different
unit operations.
Case Study 1: Simulating Intra-Domain Movements to
Predict a Complex of c-Src Protein Kinase with a ProlineRich Peptide
Deregulation of the activity of c-Src protein kinase may
cause various human diseases, including breast and colon
cancer. Therefore, inhibition of its activity is a promising
strategy in anti-cancer treatment. c-Src protein kinase is
composed of three domains, the catalytic domain, SH2 and
SH3. Fig. 2a presents a model for SH3-mediated signalling
suggested by Pisabarro et al. [96]. According to the authors,
a high affinity peptide (colored green) competes for the region of SH3 domain, which is otherwise occupied by the
linker (colored blue) between the SH2 and the catalytic domains. Today there is no crystal structure of all three domains of c-Src in the presence of a peptide bound to the SH3
domain. The goal of this study is to model this complex and
to computationally verify the model suggested by Pisabarro
et al. (Fig. 2b) presents the differences in the spatial orientations of these domains as obtained by a superimposition of
the inactive form of human c-Src (PDB code: 1fmk, blue) on
the auto-inhibited form of haematopoetic kinase (Hck, PDB
code: 1qcf, red). Fig. 2c presents the superimposition between the currently available unbound structure of the three
kinase domains with the peptide complexed in the presence
of the SH3 domain alone. Assuming that the auto-inhibited
form of human c-Src (PDB code: 1fmk) may be similar to
that of Hck (PDB code: 1qcf), the first step was to use the
FlexProt [29] algorithm to perform a flexible alignment between them. Fig. 2d presents the two hinges that were detected at residues 138 and 252 of c-Src (prediction accuracy:
1.34Å RMSD, running time: 4 seconds, Pentium IV PC). The
detected hinges were used as input to perform flexible docking (FlexDock [71, 88]) between the inactive structure of the
three domains of c-Src and the proline-rich peptide (RLP2)
(running time: 2.5 minutes, Pentium IV PC). Fig. 2e presents
the predicted complex (c-Src in blue, RLP2 in green). In order to evaluate its quality, the SH3 domain from the known
complex (PDB code: 1rlq, colored red) is superimposed on
the SH3 domain of the predicted complex. The RMSD between the peptide from the complex and the docked peptide
was 1.0Å RMSD. The SH2 and the catalytic domains are
rotated by the algorithm backward from the viewing angle of
(Fig. 2c), to allow the peptide binding without any steric
clash with the linker. This movement can be described by a
hinge-bending angle (measured from the domains mass centers to the hinge) of 26 degrees.
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
Wolfson et al.
Fig. (2). Case Study 1. (a) A model for SH3-mediated signalling suggested by Pisabarro et al.[96]. The peptide (colored green) competes
with the linker (colored blue) for the same region of the SH3 domain. (b) Rigid superimposition between two c-Src tyrosine kinase proteins:
human c-Src (PDB code: 1fmk), depicted in blue and haematopoetic cell kinase (Hck, PDB code: 1qcf), represented in red. (c) A superimposition between the c-Src protein (PDB code: 1fmk), colored blue, and a proline-rich peptide RLP2, colored green, as it appears in a complex
with the SH3 domain (PDB code: 1rlq). (d) The interdomain movement between three domains, SH3-SH2-Kinase, is detected by a flexible
alignment algorithm [29]. 1fmk is depicted in blue, 1qcf: domain SH3 in green, SH2 in orange and the kinase in red. (e) The result of flexible
hinge-bending docking [71, 88] viewed from the back side of Figure (c). c-Src protein is blue, RLP2 surface is depicted in green. The SH3
domain from the complex (PDB code: 1rlq), superimposed on the docking result, is colored in red. (f) A scheme summarizing the combinations of the algorithms used in the example.
Case Study 2: Recognition of Conserved Loops of Serine
Proteases for Binding Site Focused Docking
There are numerous evidences for disorders in function of
serine proteases and their inhibitors in several human diseases. In this case study the information obtained by a multiple secondary structure alignment is exploited to improve the
accuracy of unbound docking between a serine protease and
its bovine pancreatic inhibitor. Fig. 3a shows the structural
alignment between ten serine proteases (PDB codes: 2ptn,
1sgt, 1ton, 2alp, 2pkaAB, 2sga, 3est, 3rp2A, 3sgbE and
4chaA) as obtained by MASS [39]. It is difficult to deduce
this alignment based on sequence information alone [34, 97,
98]. The size of the conserved core (depicted in yellow) is
123 residues (RMSD 1.48Å, running time: 39 seconds,
Pentium IV PC). As expected, this region contains the two βbarrels (six strands each) that form the fold. The catalytic site
of enzymes is frequently formed by loops connecting secon-
From Structure to Function
dary structure elements [99]. The recognized core contained
three structurally conserved loops: residues 55-59, 128-130
and 189-197 (depicted in purple). Two of these included
residues that belong to the catalytic triad (His-57, Asp-102,
Ser-195). In the next step, the docking algorithm PatchDock
[72] was applied to predict a complex of a serine protease
kallikrein A with a bovine pancreatic trypsin inhibitor (PDB
code: 2kai). Both the receptor and the inhibitor were taken in
their unbound conformations as they appear in two separate
crystal complexes (PDB codes: 2pka and 6pti respectively).
The conserved loops were used by the docking algorithm as a
binding region to focus the conformational search. As a result, a near native solution was ranked at the top (see Fig.
3b), prediction accuracy 4.0Å RMSD, running time: 38 seconds, Pentium IV PC. The docked inhibitor, in green, is superimposed on the inhibitor of the complex (PDB code: 2kai,
chain I), depicted in red. The conserved loops are in purple.
Case Study 3: Recognition of Conserved Core of PLPDependent Transferase for Focused Docking
The goal of this study is the reconstruction of the complex of PLP-dependent transferase (PDB code: 1qj5) with a
PLP (pyridoxal-5-phosphate) molecule. PLP is the biologically active form of vitamin B6. Low levels are correlated
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
with cardiovascular and other diseases. The first step in this
study was to apply a multiple structural alignment algorithm
(MultiProt [38]) to 17 structures from the PLP-dependent
superfamily (PDB codes: 1ams, 1art, 1asf, 1ax4, 1ay4, 1bt4,
1cl1, 1gbn, 1oat, 1ord, 1qis, 1qj5, 2ay1, 2can, 2dkb, 2tpl and
4gsa). The identified common structural core (accuracy
3.0Å RMSD, running time: 2 minutes, Pentium IV PC) is
depicted in (Fig. 4a) on a representative structure of PDB:
1qj5 (dark blue). Only one residue (PDB: 1qj5; Asp-245)
from the computed structurally conserved core had the same
amino acid identity in all the 17 aligned proteins. The recognized residue is located in the PLP binding site, which suggests its importance for binding and for function. This residue, colored yellow, and the PLP molecule, colored orange,
are depicted in balls & sticks. The next step was to apply the
docking algorithm (PatchDock [72]) that will use the location
of this residue to improve its ranking by selecting only solutions in which this residue interacts with the ligand. The results of docking the PLP molecule (taken from PDB: 4gsa) to
diaminopelargonic acid (PDB code: 1qj5) are depicted in
(Fig. 4b). The highest ranking solution, depicted in red, is
superimposed on the complex from PDB: 1qj5, in orange
(prediction accuracy: 1.16Å RMSD, running time 16 seconds, Pentium IV PC).
Fig. (3). Case Study 2. (a) The structural alignment between ten serine proteases (PDB codes: 2ptn, 1sgt, 1ton, 2alp, 2pkaAB, 2sga, 3est,
3rp2A, 3sgbE and 4chaA). PDB:2pka is shown completely in blue. The core of the alignment is colored in yellow and the three conserved
loops are colored in purple. As one can see, two of the conserved loops are located in the active site. (b) The unbound docking [72] between
kallikrein A (PDB code: 2pka) and its inhibitor (PDB code: 6pti). The docked inhibitor, colored in green, is superimposed on the inhibitor of
the complex (PDB code: 2kai, chain I), depicted in red. The conserved loops, as obtained by a multiple structural alignment algorithm [39],
are in purple. (c) A scheme summarizing the combination of the algorithms used in the example.
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
Wolfson et al.
Fig. (4). Case Study 3. (a) The structural core identified by multiple structural alignment [38] among 17 structures from the PLP-dependent
superfamily (PDB codes: 1ams, 1art, 1asf, 1ax4, 1ay4, 1bt4, 1cl1, 1gbn, 1oat, 1ord, 1qis, 1qj5, 2ay1, 2can, 2dkb, 2tpl, 4gsa). In dark blue is a
representative structure of 1qj5. The conserved active site residue (Asp-245, 1qj5:245), colored yellow and the PLP molecule, colored orange, are depicted as ball & stick. (b) The docking [72] of the PLP (taken from 4gsa) to the diaminopelargonic acid (PDB code: 1qj5). The
predicted solution, depicted in red, is superimposed on the original, which is depicted in orange. (c) A scheme summarizing the combinations
of the algorithms used in the example.
4.2 Docking and Homology Modeling
One of the promises of structural genomics is to design
drugs directly from protein sequences. If accurate, a homology modeled protein structure may be used in docking studies. Rong et al. [100] homology modeled the RasGRP C1
domain and docked it to ligands. Combined with experimental binding affinity measurement, their study provided
detailed atomic level interactions of PKC ligands with the
RasGRP receptor. The PGHS-1 and PGHS-2 structures were
combined to model the human PGHS-2 and to dock ligands.
The docked conformation superimposed very well on the
available complexed structure [101].
Docking can be used synergistically with modeling for
structure refinement or for engineering side chains. The reliability of homology models of G-protein coupled receptors in
virtual screening of chemical databases was examined by
three docking programs (Dock [67], FlexX [68], Gold [75])
and seven scoring functions [102]. The modeled structure
was found suitable for screening antagonist compounds.
However, it was not accurate enough to retrieve known agonists, due to conformational changes during agonist activation. Using a pharmacophore has led to a model refinement
and successful agonist retrieval, illustrating the importance of
correct modeling. While often only a low-resolution structure
can be predicted, to compensate for inaccuracies, protein and
ligand models may be averaged by optimizing steric and
quasi-chemical complementarity between the interacting
partners [103]. When sequence identity was low, averaging
out uncertainties through multiple docking has led to an improvement [104]. Schafferhans and Klebe explored a method
to dock ligands onto binding site representations derived
from homology modeled proteins [105]. The ligands were
docked to a preliminary model and QSAR analysis from
ligand orientations was used to refine the structure. Alignment, ligand data analysis and protein structure modeling
were iterated until self-consistency was achieved. The approach has been validated using cases with available crystal
structures. In an effort to engineer substrate specificity of Damino-acid oxidases, Sacchi et al. [106] combined homology
modeling, docking and active site mutagenesis. The rational
design approach successfully produced enzymatic activity
with new, broader substrate specificity.
Another interesting example to combination of homology
modeling, docking and wet-lab experiments is the discovery
of the binding site of the stress protein GRP94 to the VSV8
peptide [107]. The complex of GRP94 with a specific peptide
can dramatically stimulate T cell response. Therefore, tumorderived GRP94 is possibly a powerful immunotherapeutic
tool, since it can be used to evoke immune response against
the tumor. The N-terminal domain of GRP94 was homology
modeled using the structure of the N-terminal domain of
yeast HSP90. The modeling of the complex with the VSV8
peptide was achieved by applying PatchDock [72] to detect
few potential binding sites. The validity of these computationally derived sites was next verified by site directed
mutagenesis. Out of the two highest scoring binding sites
From Structure to Function
detected by PatchDock, one was proved to be correct in the
wet-lab experiments.
One of the most successful stories comes from the redesign of ligand-binding site specificity in proteins [108]. The
procedure combines target-ligand docking with computational mutations of receptor residues in direct contact with a
wild-type ligand (typically 12-18 residues, corresponding to
1045 to 1068 mutant structures representing 1015 to 1023 sequences). Seventeen designs predicted by the automated procedure were experimentally tested for specific ligand binding. Every designed receptor exhibited a detectable affinity
for its target ligand.
4.3 Folding via Docking
Docking and protein folding are all too often considered
to be distinct fields. Yet, protein folding can be modeled as a
hierarchical process [109], where folding is done by selfdocking of previously assembled smaller units. Hierarchy
implies a process of assembly of smaller folding entities,
such as domains, hydrophobic folding units or building
blocks. In protein folding, one can first attempt to model the
structures of smaller subunits. The structure can then be
solved by an assembly of these subunits that satisfies the
backbone connectivity constraints. The CombDock method
[110] employs geometric docking techniques to implement
this assembly approach. The method is combined with the
building block identification scheme of Haspel et al. [111] to
predict protein tertiary structure.
To date, predictive protein folding focuses on single domain proteins. In an early attempt, Xu et al. [112] used
threading to model domain structure and proceeded to dock
two domains to obtain the complete modeled structure. Their
predictions were consistent with the experimental data. Recently, the CombDock method was applied to reconstruct
multi-domain protein structures by docking of its domains.
The majority of proteins function when associated in
multi-molecular assemblies. Yet, prediction of the structures
of multi-molecular complexes has largely not been addressed, probably due to the magnitude of the combinatorial
complexity of the problem. Docking schemes can be applied
to recreate multi-molecular assemblies. Eisenstein et al.
[113] used docking to construct multi-molecular complexes
that satisfy restrictive symmetries, such as in viral coat proteins. Another example is the novel application of CombDock [114] for predicting the structures of large macromolecular assemblies. The algorithm yielded good predictions
of quaternary structures of oligomers and multi-protein complexes, even for a structurally distorted input.
A combination of homology modeling, database analysis,
sequence/structure alignment, molecular docking, and experimental verification has shown its powerful potential in
biochemical research. Structural analysis has an advantage
over sequence analysis alone, since structures contain additional information. On the other hand, it presents difficulties,
as we need to consider protein flexibility, surface variability,
chemical interactions and resolution.
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
This review summarizes the main algorithmic approaches
of most of the routinely used structural tools. Different methods address different biological problems. Nevertheless, they
do share many methodological similarities. 3D pattern detection algorithms like the Geometric Hashing, clique detection,
subgraph-isomorphism, genetic algorithms, Monte-Carlo
search are commonly used in structural alignment as well as
in docking and in other applications. Some algorithms are
specific to proteins. Others can address various molecular
types, including RNA and drugs. Scoring functions are the
current bottleneck. This might possibly be explained by the
insufficient understanding of molecular interactions. Significant advances in modeling interactions will lead to progress
in folding, docking and alignment of protein binding sites.
Modeling of the flexibility is yet another major challenge
both in docking and in structural comparison.
Computational tools for sequence analysis have become a
common practice in biological experiments. Due to the extremely fast progress in the Structural Genomics projects,
structural analysis tools are increasingly and steadily turning
into an integral part of practically every aspect of biological
What is the future outlook in this field? The next major
problem we are facing is combining all methods, and in particular devising new ones to construct the map of molecular
interactions. Molecular interactions hold the key. However,
our goal is not only to put these together to obtain the cellular
pathways. We need to figure out how these are regulated.
The data are very noisy; this is inevitable when dealing with
a problem of such huge dimensions. The difficulty is how to
fish out the essential information toward this goal. To handle
the vast amount of data, extremely efficient computational
strategies are needed. Again, experimental verification will
be crucial to test the computational predictions. This is the
future mission of Computational Biology, leading toward the
goal of personalized medicine.
We thank our Structural Bioinformatics Group in Tel
Aviv and in Frederick. The research of R. Nussinov and H. J.
Wolfson in Israel has been supported in part by the “Center
of Excellence in Geometric Computing and its Applications”
funded by the Israel Science Foundation (administered by the
Israel Academy of Sciences) and by the Tel Aviv University
Adams Brain Center. The research of H.J.W. is partially supported by the Hermann Minkowski-Minerva Center for Geometry at Tel Aviv University. This project has been funded
in whole or in part with Federal funds from the National
Cancer Institute, National Institutes of Health, under contract
number NO1-CO-12400. The content of this publication does
not necessarily reflect the view or policies of the Department
of Health and Human Services, nor does mention of trade
names, commercial products, or organization imply endorsement by the U.S. Government.
Gene Ontology Consortium, Nucl. Acids. Res. 2004, 32, 258-261.
Koehl, P. Curr. Opin. Struct. Biol. 2001, 11, 348-353.
Eidhammer, I.; Jonassen, I.; Taylor, W. R. J. Comp. Biol. 2001, 7,
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
Goldsmith-Fischman, S.; Honig, B. Protein Sci. 2003, 12(9), 18131821.
Dietmann, S.; Park, J.; Notredame, C.; Heger, A.; Lappe, M.;
Holm, L. Nucleic Acids Res. 2001, 29, 55-57. http://www2.ebi.
Sauder, J.; Arthur, J.; Dunbrack Jr, R. Proteins 2000, 40, 6-22.
D’Alfonso, G.; Tramontano, A.; Lahm, A. J. Struct. Biol. 2001,
134, 246-256.
Russell, R. B.; Sasieni, P. D.; Sternberg, M. J. J. Mol. Biol. 1998,
282, 903-18.
Needleman, S.; Wunsch, C. J. Mol. Biol. 1970, 48, 443-453.
Nalefski, E.; Falke, J. Protein Sci. 1996, 5, 2375-2390.
Holm, L.; Sander, C. J. Mol. Biol. 1993, 233(1), 123-138.
Murzin, A.; Brenner, S.; Hubbard, T.; Chothia, C. J. Mol. Biol.
1995, 247, 536-540.
Pearl, F.; Bennett, C.; Bray, J.; Harrison, A.; Martin, N.; Shepherd, A.; Sillitoe, I.; Thornton, J.; Orengo, C. Nucl. Acids Res.
2003, 31, 452-455.
Holm, L.; Sander, C. Nucl. Acids Res. 1996, 24, 206-209.
Hadley, C.; Jones, D. Structure 1999, 7, 1099-1112.
Getz, G.; Vendruscolo, M.; Sachs, D.; Domany, E. Proteins 2002,
46, 405-415.
Shindyalov, I.; Bourne, P. Protein Eng. 1998, 11(9), 739-747.
Bachar, O.; Fischer, D.; Nussinov, R.; Wolfson, H. J. Protein Eng.
1993, 6(3), 279-288
Madej, T.; Gibrat, J.; Bryant, S. Proteins 1995, 23, 356-369.
Gerstein, M.; Krebs, W. Nucl. Acids Res. 1998, 26, http://bioinfo.
Lee, R.; Razaz, M.; Hayward, S. Bioinformatics 2003, 19, 12901291.
Wriggers, W.; Schulten, K. Proteins 1997, 29(1), 1-14.
Boutonnet, N.; Rooman, M.; Wodak, S. J. Mol. Biol. 1995, 253,
Ochagavia, M. E.; Richelle, J.; Wodak, S. J. Bioinformatics 2002,
18, 637-640.
Verbitsky, G.; Nussinov, R.; Wolfson, H. J. Proteins 1999, 34,
Shatsky, M.; Fligelman, Z.; Nussinov, R.; Wolfson, H. J. Alignment
of flexible protein structures. In 8th International Conference on
Intelligent Systems for Molecular Biology; The AAAI press: 2000.
Carugo, O. Proteins 2001, 42, 390-398.
Jacobs, D.; Rader, A.; Kuhn, L.; Thorpe, M. Proteins 2001, 44,
Shatsky, M.; Nussinov, R.; Wolfson, H. J. Proteins 2002, 48, 242256.
Nicholas, H. B. J.; Ropelewski, A. J.; Deerfield, D. W. N. Biotechniques 2002, 32(3), 572-591.
Gerstein, M.; Levitt, M. Using iterative dynamic programming to
obtain accurate pairwise and multiple alignments of protein structures. In Proceedings of the Fourth International Conference on
Intelligent Systems for Molecular Biology; The AAAI press: 1996.
Akutsu, T.; Sim, K. Genome Inform. 1999, 10, 23-29.
Russell, R.; Barton, G. Proteins 1992, 14, 309-323.
Yang, A.; Honig, B. J. Mol. Biol. 2000, 301, 691-711.
Taylor, W.; Flores, T.; Orengo, C. Protein Sci. 1994, 3, 1858-1870.
Escalier, V.; Pothier, J.; Soldano, H.; Viari, A. J. Comp. Biol. 1988,
5, 41-56.
Leibowitz, N.; Nussinov, R.; Wolfson, H. J. J. Comp. Biol. 2001, 8,
Shatsky, M.; Nussinov, R.; Wolfson, H. J. MultiProt - a multiple
protein structural alignment algorithm. In Workshop on Algorithms
in Bioinformatics, Vol. 2452; Guigo, R.; Gusfield, D., Eds.; LNCS,
Springer Verlag: 2002.
Dror, O.; Benyamini, H.; Nussinov, R.; Wolfson, H. Protein Sci.
2003, 12, 2492-2507.
Akutsu, T.; Halldorson, M. M. Theoretical Computer Science 2000,
233, 33-50.
Ma, B.; Elkayam, T.; Wolfson, H. J.; Nussinov, R. Proc. Natl.
Acad. Sci. USA 2003, 100, 5772-5777.
Guda, C.; Scheeff, E.; Bourne, P.; Shindyalov, I. A new algorithm
for the alignment of multiple protein structures using Monte Carlo
optimization. In Proceedings of the Pacific Symposium on Biocomputing; 2001.
O’Hearn, S.; Kusalik, A.; Angel, J. Protein Eng. 2003, 16, 169-178.
Wolfson et al.
Nagano, N.; Orengo, C. A.; Thornton, J. M. J. Mol. Biol. 2002, 321,
Kuttner, Y. Y.; Sobolev, V.; Raskind, A.; Edelman, M. Proteins
2003, 52, 400-411.
Denessiouk, K. A.; Rantanen, V.; Johnson, M. Proteins 2001, 44,
Phillips, C. L.; Ullman, B.; Brennan, R. G.; Hill, C. P. EMBO J.
1999, 18, 3533-3545.
Wallace, A. C.; Laskowski, R. A.; Thornton, J. M. Protein Sci.
1996, 5, 1001-1013.
Wallace, A. C.; Borkakoti, N.; Thornton, J. M. Protein Sci. 1997, 6,
Barker, J. A.; Thornton, J. M. Bioinformatics 2003, 19, 1644-1649.
Artymiuk, P. J.; Poirrette, A. R.; Grindley, H. M.; Rice, D. W.;
Willett, P. J. Mol. Biol. 1994, 243, 327-344.
Spriggs, R. V.; Artymiuk, P. J.; Willett, P. J. Chem. Inf. Comput.
Sci. 2003, 43, 412-421.
Binkowski, T.; Adamian, L.; Liang, J. J. Mol. Biol. 2003, 232, 505526.
Moodie, S. L.; Mitchell, J. B. O.; Thornton, J. M. J. Mol. Biol.
1996, 263, 486-500.
Rosen, M.; Lin, S.; Wolfson, H. J.; Nussinov, R. Protein Eng. 1998,
11, 263-277.
Schmitt, S.; Kuhn, D.; Klebe, G. J. Mol. Biol. 2002, 323, 387-406.
Kinoshita, K.; Nakamura, H. Protein Sci. 2003, 12, 1589-1595.
Brakoulias, A.; Jackson, R. M. Proteins 2004, 56, 250-60.
Shulman-Peleg, A.; Nussinov, R.; Wolfson, H. J. J. Mol. Biol.
2004, 339(3), 607-633
Jones, S.; Thornton, J. M. Curr. Opin. Chem. Biol. 2004, 8, 3-7.
Schneider, G.; Böhm, H. Drug Discov. Today 2002, 7, 64-70.
Keseru, G. J. Comput. Aided Mol. Des. 2001, 15, 649-657.
Schneidman-Duhovny, D.; Nussinov, R.; Wolfson, H.J. Curr. Med.
Chem. 2004, 11, 91-107.
Sternberg, M.; Gabb, H.; Jackson, R. Curr. Opin. Struct. Biol.
2002, 12, 28-35.
Halperin, I.; Ma, B.; Wolfson, H. J.; Nussinov, R. Proteins 2002,
47, 409–443.
Abagyan, R.; Totrov, M. Curr. Opin. Chem. Biol. 2001, 5, 375-382.
Ewing, T.; Makino, S.; Skillman, A.; Kuntz, I. J. Computer-Aided
Mol. Des. 2001, 15, 411-428.
Rarey, M.; Kramer, B.; T., L. Time-Efficient Docking of Flexible
Ligands into Active Sites of Proteins. In 3’rd Int. Conf. on Intelligent Systems for Molecular Biology (ISMB’95); AAAI Press: Cambridge, UK, 1995.
Norel, R.; Petrey, D.; Wolfson, H. J.; Nussinov, R. Proteins 1999,
35, 403–419.
Goldman, B.; Wipke, W. Proteins 2000, 38, 79–94.
Schneidman-Duhovny, D.; Inbar, Y.; Polak, V.; Shatsky, M.;
Halperin, I.; Benyamini, H.; Barzilai, A.; Dror, O.; Haspel, N.;
Nussinov, R.; Wolfson, H. J. Proteins 2003, 52, 107-112.
Duhovny, D.; Nussinov, R.; Wolfson, H. J. Efficient unbound
docking of rigid molecules. In Workshop on Algorithms in Bioinformatics, Vol. 2452; Guigo, R.; Gusfield, D., Eds.; LNCS,
Springer Verlag: 2002
Abagyan, R.; Totrov, M.; Kuznetsov, D. J. Comput. Chem. 1994,
15(5), 488-506.
Österberg, F.; Morris, G.; Sanner, M.; Olson, A.; Goodsell, D.
Proteins 2002, 46, 34–40.
Jones, G.; Willet, P.; Glen, R.; Leach, A. J. Mol. Biol. 1997, 267,
Gardiner, E.; Willett, P.; Artymiuk, P. Proteins 2001, 44, 44–56.
Heifetz, A.; Katchalski-Katzir, E.; Eisenstein, M. Protein Sci. 2002,
11, 571-587.
Chen, R.; Weng, Z. Proteins 2002, 47, 281–294.
Gabb, H.; Jackson, R.; Sternberg, M. J. Mol. Biol. 1997, 272, 106120.
Mandell, J.; Roberts, V.; Pique, M.; Kotlovyi, V.; Mitchell, J.;
Nelson, E.; Tsigelny, I.; Ten Eyck, L. Protein Eng. 2001, 14,
Vakser, I. Protein Eng. 1995, 8, 371–377.
Ritchie, D.; Kemp, G. Proteins 2000, 39, 178–194.
Katchalski-Katzir, E.; Shariv, I.; Eisenstein, M.; Friesem, A.;
Aflalo, C.; Vakser, I. Proc. Natl. Acad. Sci. USA 1992, 89, 21952199.
Jones, G.; Willet, P.; Glen, R. J. Mol. Biol. 1995, 245, 43-53.
From Structure to Function
Fernandez-Recio, J.; Totrov, M.; Abagyan, R. Protein Sci. 2002,
11, 280-291.
Schnecke, V.; Swanson, C.; Getzoff, E.; Tainer, J.; Kuhn, L. Proteins 1998, 33, 74-87.
Clauβen, H.; Buning, C.; Rarey, M.; Lengauer, T. J. Mol. Biol.
2001, 308, 377-395.
Sandak, B.; Nussinov, R.; Wolfson, H. J. J. Comp. Biol. 1998, 5(4),
Charifson, P.; Corkery, J.; Murcko, M.; Walters, W. J. Med. Chem.
1999, 42, 5100–5109.
Doman, T.; McGovern, S.; Witherbee, B.; Kasten, T.; Kurumbail, R.; Stallings, W.; Connolly, D.; Shoichet, B. J. Med. Chem.
2002, 45, 2213–21.
Hindle, S.; Rarey, M.; Buning, C.; Lengauer, T. J. Computer-Aided
Mol. Des. 2002, 16, 129-149.
Dror, O.; Shulman-Peleg, A.; Nussinov, R.; Wolfson, H. J. Curr.
Med. Chem. 2004, 11, 71-90.
Vajda, S.; Camacho, C. Trends Biotechnol. 2004, 22, 110-6.
Giordanetto, F.; Kroemer, R. Protein Eng. 2003, 16, 115-124.
O’Maille, P.; Bakhtina, M.; Tsai, M. J. Mol. Biol. 2002, 321, 677691.
Pisabarro, M. T.; Serrano, L.; Wilmanns, M. J. Mol. Biol. 1998,
281, 513-521.
Russell, R.; Barton, G. Proteins 1992, 14, 309-323.
Sali, A.; Blundell, T. J. Mol. Biol. 1990, 212, 403-428.
Branden, C.; Tooze, J. Introduction to Protein Structure, Second
Edition; Garland Publishing, Inc.: New York, 1999.
Rong, S.; Enyedy, I.; Qiao, L.; Zhao, L.; Ma, D.; Pearce, L.;
Lorenzo, P.; Stone, J.; Blumberg, P.; Wang, S.; Kozikowski, A. J.
Med. Chem. 2002, 45, 853-860.
Current Protein and Peptide Science, 2005, Vol. 6, No. 2
Pouplana, R.; Lozano, J.; Ruiz, J. J.Mol. Graph. Model 2002, 20,
Bissantz, C.; Bernard, P.; Hibert, M.; Rognan, D. Proteins 2003,
50, 5-25.
Wojciechowski, M.; Skolnick, J. J. Comput. Chem. 2003, 23, 189197.
Mazitsos, C.; Rigden, D.; Tsoungas, P.; Clonis, Y. Eur. J. Biochem.
2002, 269, 5391-5405.
Schafferhans, A.; Klebe, G. J. Mol. Biol. 2001, 307, 407-427.
Sacchi, S.; Lorenzi, S.; Molla, G.; Pilone, M.; Rossetti, C.;
Pollegioni, L. J. Biol. Chem. 2002, 277, 27510-27516.
Gidalevitz, T.; Biswas, C.; Ding, H.; Schneidman-Duhovny, D.;
Wolfson, H.; Stevens, F.; Radford, S.; Argon, Y. J. Biol. Chem.
2004, 279(16), 16543-52.
Looger, L.; Dwyer, M.; Smith, J.; Hellinga, H. Nature 2003, 423,
Baldwin, R.; Rose, G. Trends Biochem. Sci. 1999, 24, 26-33.
Inbar, Y.; Benyamini, H.; Nussinov, R.; Wolfson, H.J. Bioinformatics 2003, 19 Suppl. 1, i158-i168.
Haspel, N.; Tsai, C.; Wolfson, H.; Nussinov, R. Protein Sci. 2003,
12, 1177–1187.
Xu, D.; Baburaj, K.; Peterson, C.; Xu, Y. Proteins 2001, 44, 312320.
Berchanski, A.; Eisenstein, M. Proteins 2003, 53, 817-829.
Inbar, Y.; Benyamini, H.; Nussinov, R.; Wolfson, H. 2004, submitted.
Baker, D.; Sali, A. Science 2001, 294, 93-96.
Jaramillo, A.; Wernisch, L.; Hery, S.; Wodak, S. J. Comb Chem
High Throughput Screen 2001, 4, 643-659.