Current Protein and Peptide Science, 2005, 6, 171-183 171 From Structure to Function: Methods and Applications Haim J. Wolfson1, Maxim Shatsky1, Dina Schneidman-Duhovny1, Oranit Dror1, Alexandra Shulman-Peleg1, Buyong Ma2 and Ruth Nussinov2,3,* 1 School of Computer Science, The Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel, 2Basic Research Program, SAIC - Frederick, Inc., Laboratory of Experimental and Computational Biology, NCI-Frederick, Bldg 469, Rm 151, Frederick, MD 21702, USA, 3Sackler Inst. of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel Abstract: The rapid increase in experimental data along with recent progress in computational methods has brought modern biology a step closer toward solving one of the most challenging problems: prediction of protein function. Comprehension of protein function at its most basic level requires understanding of molecular interactions. Currently, it is becoming universally accepted that the scale of the accumulated data for analysis and for prediction necessitate highly efficient computational tools with appropriate application capabilities. The review presents the up-to-date advances in computational methods for structural pattern discovery and for prediction of molecular associations. We focus on their applications toward a range of biological problems and highlight the advantages of the combination of these methods and their integration with biological experiments. We provide examples, synergistically merging structural modeling, rigid and flexible structural alignment and detection of conserved structural patterns and docking (rigid and flexible with hinge-bending movements). We hope the review will lead to a broader utilization of computational methods, and their cross-fertilization with experiment. Keywords: Structural pattern discovery, structural genomics, structure comparison, structural alignment, docking, drug design, conservation, flexible structural alignment, flexible docking, multiple structural alignment, folding, hinge-bending. 1 INTRODUCTION The goal of Computational Biology is to understand living systems through calculations at the molecular level. In particular, the aim is to predict how molecules interact, to form the myriad of ways that they are connected and regulated in the living cell. Apart from data management for the huge amount of biological information, computational biology provides an “experimental tool" for obtaining information from nature. The interplay between “computational experiments” and conventional biological experiments is an important characteristic of modern biological research. However, a “computational experiment” is an in silico prediction, which must be followed by experimental validation. Conversely, starting a project with a broad range of biological experiments may require too many resources. These may be significantly reduced with the help of computational tools. Consequently, computational tools are increasingly becoming vital instruments in biological research. Experimentally determined structures are still unavailable for most proteins. Yet, due to the Structural Genomics initiative, this situation is changing rapidly. The rapid increase in the available three-dimensional structural data presents a major challenge: How to best exploit, extract and in parti*Address correspondence to this author at the NCI-Frederick Bldg 469, rm 151, Frederick, MD 21702, USA; Tel: 301-846-5579; Fax: 301-846-5598; E-mail: email@example.com The publisher or recipient acknowledges right of the U.S. Government to retain a nonexclusive, royalty-free license in and to any copyright covering the article. 1389-2037/05 $50.00+.00 cular predict biologically relevant features toward the ultimate goal of predicting protein function, and of putting these together to produce the biological system. Obtaining, digesting and applying protein structural information has contributed immensely to our comprehension of protein structure-function relationship. According to the Gene Ontology  classification protein function is represented by a three level hierarchy: (i) the Molecular Function (MF) of a protein, (ii) the Biological Process (BP) in which it participates, and (iii) the Cellular Component (CC) in which it functions. This review focuses on the first level of this hierarchy, which describes biological activities, such as catalytic and binding activities, at the molecular level. Protein structure classification assists in detection of functionally similar proteins that share limited sequence similarity. Structural alignments are essential for detecting conserved protein cores, similarities in functional binding sites, similarities in enzyme mechanisms and evolutionary conservation. Homology modeling is used to predict an unknown protein structure based on its sequence similarity to known structures. Protein surface/interface analysis is used to reveal the structure-property relationship of interacting molecules. Structure-based drug design is applied to protein targets with known active/binding sites. Molecular interactions are predicted by docking of small ligands, proteins or nucleic acids. Most applications involve a series of modules, similar to ’unit operations’ in chemical engineering. Sequence and structural alignments are probably the most widely used ’unit © 2005 Bentham Science Publishers Ltd. 172 Current Protein and Peptide Science, 2005, Vol. 6, No. 2 operations’. Some ’unit operation’ modules may rely on others, for example, comparative modeling based on multiple sequence/structure alignment, or modeling a folded protein via docking of small protein fragments. Numerous algorithms are available for each ’unit operation’. Yet, no single program can handle all applications. Fig. 1 provides an overview of typical routine applications that utilize the sequence and structure analyses. Here, we review state-of-the-art structural methodologies. Our focus is not on the algorithmic detail. Rather, our aim is to highlight the strengths and weaknesses of these methodologies, the challenges and the potential solutions for the different types of applications. In the case studies presented below, we demonstrate how a computational strategy, which synergistically combines the various tools, enhances the biological applications. In addition, numerous examples from the literature are described in the last section. 2 PROTEIN STRUCTURE ALIGNMENT Protein structure alignment has become a standard structural analysis tool [2, 3]. Protein structure similarity may indicate a functional/evolutionary relationship. Recognition Wolfson et al. of a structural core common to a set of protein structures can serve as a basis for homology modeling  and for structural classification . There are three potential ways to predict the function of a new protein. The first is to perform sequence comparison with a protein whose function is known. However, several studies have shown that sequence analysis alone may lead to unsatisfactory results . Even at 75% protein sequence identity, structural alignment is more reliable . In addition, Russel et al.  presented examples of similar binding sites found in analogous proteins (proteins that converge to the same folding motif without having any common ancestor). The second way to infer the function of a new protein is by seeking proteins with a similar fold and with a known function. This can be achieved by performing structural alignment. Taking into account protein flexibility, such as domain movements, makes the structural alignment even more accurate. Recognition of conformational changes is crucial for understanding biological processes. Yet, in some cases similarity in function does not necessarily requires similarity of folds. Therefore, the third option to infer protein function is to detect proteins that have similar binding patterns and thus may perform similar function. The most Fig. (1). Modern biochemical research combines in vitro experiments, computational tools and large data resources. The diagram illustrates some possible application flows of various ’unit operations’. For example, given a protein target we may be interested in compounds that can inhibit it. If the structure of the protein target is unknown, structure modeling is applied first. Next, multiple sequence/structural alignment can aid in recognizing potential binding sites that should be verified, for example, by mutagenesis experiments. Docking procedures can then search large compound databases for potential leads. High scoring solutions will be further verified by in vitro experiments. The diagram presents the following ’unit operations’ (gray boxes in the figure): structure modeling , sequence alignment , structure alignment, docking and protein design . From Structure to Function common example of such patterns is the catalytic triads, common to proteins with different trypsin and subtilisin folds. Another example is hundreds of structures with tens of different folds that can bind ATP. What do these proteins have in common that make them bind the same ligand? Structural alignment between the functional sites of these proteins may assist in answering this question. 2.1 Protein Structural Similarity There are several ways to represent a protein structure in 3D space. The selected representation determines the level of resolution at which the structures are compared. Probably the coarsest representation is by secondary structure elements (SSE’s). This representation is highly efficient for a fast comparison of the overall protein folds. The most common representation is by the backbone atoms (e.g. Cα). This allows the recognition of common structural motifs in proteins that do not necessarily have the same overall fold or SSEs. A higher level of resolution is a representation by all atoms or by functional groups that combine atoms according to the interaction in which they may participate (for instance, the aromatic property of a phenyl ring can be represented by its centroid). Such a representation is advantageous in the comparison of protein functional sites that do not share any sequence or overall fold similarity. A scoring function that measures structural similarity should quantitatively discriminate between functionally/evolutionarily related and unrelated proteins. In most cases, scoring functions measure local or global geometrical differences between aligned structures, for example the Root Mean Squared Deviation, (RMSD) of the distances between aligned Cα atoms. Specific features can also be taken into account, such as similarity between the aligned amino acids as well as the quality of the SSE alignment. There are almost as many scoring functions as the number of alignment methods [2, 3]. However, it is hard to determine which function is the best, since there is no universally agreed theoretical support to justify any of these. Therefore, a preferable scoring scheme is likely to be a simple one, for instance, one that maximizes the alignment size while keeping the distance between matched atoms less than a specified threshold. Sequence order dependence/independence of the alignments has biological and computational consequences. For closely related proteins, a sequential alignment usually leads to a good structural alignment. Utilizing sequential information in a structural alignment can significantly reduce the computational complexity. In particular, the dynamic programming algorithm, which is so popular in sequence alignment , can be exploited. Preserving the sequence order can assist in generating structure-based sequence profiles for homology modeling. However, strictly following the sequence order in structural alignment may be misleading. There are numerous examples of proteins sharing similar 3D structures and functions that have different sequences and even different 3D topological order (e.g. C2 domain-like , four-helix bundle ). Therefore, in these cases, sequence order independent methods should be used. These can be applied to detect conserved 3D motifs in the protein interior, on protein surfaces or in protein-protein interfaces. Current Protein and Peptide Science, 2005, Vol. 6, No. 2 173 The different measures of protein structural similarity can be used to classify known protein structures. Different protein structure databases have been constructed based on the master database of the Protein Structure Databank (PDB). The most popular databases are SCOP, CATH and FSSP. SCOP  defines ten protein structural classes: all-α, all-β, α+β, α/β, multidomain, membrane and cell-surface proteins and peptides, small proteins, peptides, designed proteins and non-protein structures. From the protein classes the SCOP hierarchy descends to folds, superfamilies, families, domains and finally to species. The classification is based on visual inspection/comparison and limited use of sequence similarity, where no methodologically defined structural similarity score is employed. The CATH  database, similar to SCOP, also contains a hierarchical classification of protein domains. However, in contrast with SCOP, mostly automated methods are used to assess a structural and sequence classification of a new protein structure. First, the new protein is subdivided into well defined domains. Then, each structural domain is assigned to a topological family and a homologous superfamily. In total, there are five levels of hierarchical classification. These range from the coarse secondary structure composition (mainly alpha, mainly beta, alpha-beta) to sequence families of proteins having very similar 3D structures. FSSP  is a fully automatic classification that is based on exhaustive pairwise structural alignments performed by DALI . The output forms structurally representative fold sets at varying uniqueness levels. Interestingly, even though the databases use different methods and similarity measures, the classifications in SCOP, CATH and FSSP largely agree . FSSP scores can even be used to automatically assign the SCOP and CATH classifications to fold level and topology level, respectively . There are different ways to classify the algorithms for protein structural alignment, such as pairwise versus multiple or rigid versus flexible. The problem of rigid pairwise alignment was extensively studied. Many on-line resources are available for high quality alignments, for example: CE , DALI , Geometric Hashing (GH)  and VAST . Here, we discuss the more difficult tasks: flexible alignment and multiple alignment. 2.2 Pair-wise Flexible Alignment While loops are often variable, the fold is usually conserved in protein families, leading to overall structure and binding site preservation. Thus, most structural alignment methods treat protein structures as rigid bodies. However, protein flexibility cannot be underestimated. Around their native state, there is an ensemble of conformational isomers separated by low energy barriers. The movements reflect fast side-chain movements as well as slow large scale hingebending movements [20, 21]. In hinge-bending movements molecular parts rotate as relatively rigid bodies. Consider a pair of proteins with a structurally similar motif. A hingebending movement within the motif will significantly reduce the rigid structural alignment score between both proteins. A flexible structural alignment algorithm should nevertheless detect such a structurally similar motif. Such a situation may happen due to the conformational changes in protein associations, mutations in the sequence or change in physical conditions. Hinge-bending movements involve large scale 174 Current Protein and Peptide Science, 2005, Vol. 6, No. 2 movements critical for function. Efficient detection of hinges is important for binding site identification and inhibitor/drug design. If a movement is consistent among a protein family, most likely it is conserved for function. Assessing the movements is advantageous in the design of larger drugs to fill the larger volumes formed during domain opening. Understanding possible conformational changes is useful in deciphering the activation mechanism and can assist in the computational modeling of the biological system. Case study 1 presents the modeling of the c-Src and proline-rich peptide complex, which can only be possible through the recognition of the conformational changes in the c-Src kinase. Analogous to rigid structural alignment, flexible structural alignment algorithms have to solve three major tasks. First, the usually difficult residue correspondence (alignment) task; second, the corresponding region superimposition and, third, hinge detection. Of course, the solutions may be interrelated. Some methods [22, 23, 24] assume that the residue alignment is already solved by another technique. This may happen, for example, when we are given two structures of the same protein (for example in the complexed and free states) or when there is a significant sequence similarity between the two proteins, thus enabling application of a sequence alignment technique to detect residue correspondence. Yet, the correspondence between the amino acids is frequently unknown. Several algorithms have been developed to tackle this challenging goal. The methods of Verbitsky et al.  and FlexMol  perform flexible residue orderindependent protein alignment with the requirement of prespecification of potential hinges. To improve the identification of potential hinge positions, one can apply methods for domain detection and interdomain linkage [27, 28]. In particular, the recently developed FlexProt algorithm does not require an a priori knowledge of the flexible hinge regions . It simultaneously detects hinge regions and aligns the rigid subparts of the molecules. An advance partitioning of the flexible molecule into rigid parts is not required. However, since FlexProt exploits the amino acid sequence order, it cannot detect non-sequential alignments. There still remains the challenging task of developing a method that automatically detects flexible regions and at the same time optimizes the sequence orderindependent alignment of the rigid parts. 2.3 Multiple Structure Alignment Reliable multiple structure alignment is one of the major challenges in structure analysis. Multiple structural alignment is infinitely superior to pairwise alignment, essentially for similar reasons that multiple sequence alignment is significantly more informative than pairwise. Conservation of a region in many molecules bears higher significance as compared to its recurrence in only two. Multiple sequence alignment programs , like Psi-BLAST or ClustalW, have become almost everyday tools for computational and even noncomputational biologists. Yet, efficient multiple structure analysis algorithms have only recently become available. Multiple structure alignment is a powerful tool for detecting structural motifs common to proteins that share a similar function. This allows derivation of residue conserva- Wolfson et al. tion in analogous positions in 3D space, such as in protein cores or in protein binding sites. Another application of multiple structural alignment is for inferring evolutionary relations between proteins, like identification of a consensus (sub)structure that can serve as a model of a potential ancestor . Additionally, methods for multiple structure alignment can aid in structure prediction . For example, homology modeling methods utilize multiple structure alignment to increase the accuracy of the structural model . Most methods obtain the multiple alignment through global pair-wise comparisons of the input structures [31, 32]. Examples for such methods include STAMP , PrISM  and SSAPm . Such methods are potentially time efficient. However, these methods have an inherent limitation, since in each pairwise comparison the only available information is with regard to the two molecules which are involved. Hence, alignments optimal for the whole set may be overlooked, unless they are optimal for pairs . The general strategy of pairwise-based methods is first to align the more similar structures and then to proceed to the less similar. Therefore, the main advantage of these methods is their ability to classify the input proteins into structurally related sub-families. Furthermore, most methods use dynamic programming and consequently are sequence order-dependent and cannot detect non-sequential motifs. However, order dependence can be assumed for applications such as homology modeling. A more meaningful approach is to consider all given molecules simultaneously, rather than initiate from pair-wise comparisons. This approach was adopted by Escalier et al. , MUSTA , MultiProt  and MASS . Since the general problem is computationally NP-hard , these methods assume some geometrical or protein structure properties to obtain reasonable running times (MultiProt and MASS run between several seconds for a few structures to several minutes for tens of structures). These methods are non-sequential and thus can detect motifs that are structurally, but not sequentially, conserved. A recent application of multiple structural alignment  to surface analysis has shown that the conservation of Trp, Phe and Met on the protein surface supplies strong evidence for a binding site . For all three residues, there is a significant conservation in binding sites while there is no similar conservation on the rest of the surface. In this article we provide two additional examples of multiple structure alignment applications. In case study 3 multiple structural alignment reveals the functionally conserved active site residue of the PLP-dependent transferase and in case study 2 multiple alignment leads to the recognition of conserved loops of serine proteases that include the Ser and His residues of the catalytic triad. Partial solutions of multiple alignment may have an important biological meaning. Consider for example a set of 100 input molecules, where 40 molecules are structurally similar taken from family “A”, 50 structurally similar molecules from family “B” and 10 additional molecules that are structurally dissimilar to any other input molecule. A multiple alignment of all 100 molecules would probably detect a common structural motif consisting of at most one secondary structure element. Clearly, it is vital for a multiple alignment method to be able to recognize two sets, “A” and “B”, from the 100 structures. Methods that adopt the pair-wise strategy From Structure to Function will be the most suitable for such a classification. Furthermore, in some cases there might be only a sub-structure (e.g. motif or domain) that is similar between some of the input molecules. For example, a number of proteins from both families “A” and “B” may share a small common motif. Therefore, to detect such partial similarities the methods that consider all molecules simultaneously are more appropriate. The method of Escalier et al. , MultiProt  and MASS  allow detection of a number of (different) common structural motifs and are capable of distinguishing between similar and dissimilar molecules. Multiple structure alignment can also be performed in a randomized manner. Randomized techniques are used extensively for solving complex problems where no optimal methods with feasible running times exist. For example, Monte Carlo optimization was implemented to solve the multiple structure alignment task . Most methods analyze structural alignment only at one level: at the level of single atoms, secondary structure elements, or larger structural fragments. MolCom  provides a multilevel analysis of multiply aligned protein structures. A 3D map adopts a hierarchical representation of biochemical and structural properties. Each level has an appropriate scoring function that allows to quantitatively estimate the protein alignment at different structural levels (e.g. secondary structures, amino acids and single atoms). Such an approach appears to be suitable for structural analysis due to the hierarchical nature of proteins. 2.4 Recognition of Functional Sites Recognition of functionally similar proteins with no sequence or fold similarity provides an enormous challenge, for which most of the previously described methods are not applicable. Frequently, similar folds imply similar function, however, proteins with the same fold, like TIM barrels , can have different functions. In contrast, proteins with different folds, like zinc-binding proteins, can share the same function. Therefore, a more valid assumption is that similar binding patterns imply similar functions. Identification of similar functional sites has several applications in drug design: (i) It can provide hints to ligands and ligand fragments that may be useful in designing new drugs; and (ii) it may point to proteins that can potentially cause side-effects. The simplest way to address this problem is by analysis of structurally diverse proteins that were crystallized in the presence of the same ligand. The superimposition of these complexes in a way that aligns their common ligand suggests the alignment of their binding sites [45, 46]. This allows comparison and analysis of the spatial arrangements of the functional groups that are essential for binding. However, the main drawback of such an approach is the fact that the same ligand can bind in alternative modes even to the same protein binding site [47, 46]. Therefore, a more general way to tackle this problem is by investigating the spatial physico-chemical patterns exhibited by the amino acids. This approach is potentially more powerful, since it can analyze unbound structures and no ligand information is required. However, it requires solving the more difficult problem of sequence order independent alignment of certain features in 3D space. Moreover, the problem is further complicated by the fact that Current Protein and Peptide Science, 2005, Vol. 6, No. 2 175 the matched features must also create surface regions with similar shapes that can accommodate similar ligands. There are two main approaches to compare protein binding sites. The first type recognizes specific three-dimensional patterns of amino acids. The input to algorithms of this type is a description of the specific pattern exhibited by a certain set of residues, which is searched in a database of protein structures. Most of these methods were specifically developed for recognition of catalytic residues that form such patterns as ’catalytic triads’. These patterns are typical for some of the protein families, like serine proteases, triacylglycerol lipases, ribonucleases and lysozymes. Some of these patterns, like the ’catalytic triad’ of serine proteases can be present in proteins with different overall folds (e.g. subtilisin and trypsin folds) indicating the functional similarity of these protein families. Examples of such methods are TESS [48, 49, 50], ASSAM [51, 52] and the method presented by Binkowski et al. . On the other hand, there are many biological examples of totally different proteins that can share the same binding partners without having any spatial pattern of residues with the same identity . To address this problem, the second type of methods compares the surface regions that comprise the binding sites. Methods for recognition of similar binding sites without any assumption regarding the fold or the identity of the amino acids have been presented by Rosen et al. , Schmitt et al. , Kinoshita et al. , Brakoulias et al.  and Shulman-Peleg et al. . To show the usefulness of this approach, the databases of binding sites, CaveBase  and eF-site , were searched to recognize similar binding sites shared by totally unrelated proteins. Additional methods for recognition of functional sites in protein structures can be found in a recent review by Jones et al. . Recognition of functional sites by these methods can be complementary to techniques such as docking and may require the presence of specific (functionally important) interactions. 3 DOCKING: EXPECTATIONS AND REALITY As the availability of protein and nucleic-acid structures grows, algorithms for docking of drugs to receptor molecules are becoming integral tools for rational drug design. Docking can be applied in various stages of the drug design process. The most common application is virtual screening of a compound library for discovery of new leads. This approach has been applied successfully, leading to a number of novel ligands . In addition, docking is used in de novo design methods, where the receptor is screened against a fragment library . Docking is also applied for a partial prediction of ADMET (Absorption, Distribution, Metabolism, Excretion and Toxicity) properties. Potential drug candidates can be screened in silico against proteins active in cellular metabolism, such as Cytochrome P450 enzymes . Screening against different proteins can aid in predicting potential drug toxicity. Such computational filtering can save not only many in vitro tests, but also some of the expensive in vivo experiments. Docking methods are also used for predicting protein–protein interactions. Despite the growth of the PDB, the number of available protein–protein complexes is relatively small. Structural modeling of disease related protein–protein associations may lead to the development of inhibitors to 176 Current Protein and Peptide Science, 2005, Vol. 6, No. 2 disrupt these associations. Such inhibitor might again be discovered by docking small molecule libraries to each of the proteins involved in the complex. In addition, docking algorithms are helpful in predicting the 3D structures of large macromolecular complexes. Current experimental high resolution structure determination methods, such as X-ray crystallography and NMR, have difficulties in solving the structures of large macromolecular associations. Expectations: Given the structures of two molecules, we expect from a docking algorithm to predict whether they interact, and if so, what is the 3D structure of their complex and the affinity of the interaction. 3.1 Methods As the demand for efficient docking algorithms grows, many are being developed and improved [63, 64, 65, 66]. All algorithms include two major stages: 3D search and scoring. The 3D search stage seeks possible receptor-ligand configurations, while the scoring stage ranks them. Several issues should be considered in selecting a proper method: Molecules and Binding Sites Whereas most docking methods are designed to handle only one type, either protein-protein docking or protein-drug, others are applicable to all molecule types. Moreover, some methods require binding site knowledge, while others can use it as optional information. Binding site knowledge can significantly improve the quality of the docking results. Case studies 2 and 3 demonstrate automatic detection of binding sites by multiple structural alignment and its employment in the search stage of docking. 3D Search The algorithms can be divided into local shape feature matching, randomized, and brute force enumeration by their search technique. Dock , FlexX , PPD , QSDock , BUDDA , PatchDock  and other algorithms align local shape features, while in ICM , AutoDock , Gold  and GAPDOCK  the search is performed using randomized and/or evolutionary algorithms. MolFit , ZDOCK , 3D-Dock , DOT , GRAMM , Hex  and other protein–protein docking algorithms use a brute force search for the three rotational parameters and the FFT (Fast Fourier Transform ) for fast enumeration of the translations. The selected search method should be fast (to allow many docking experiments) and reliable (to retain the correct result among suggested candidate complexes). Flexibility Treating molecular flexibility is one of the major challenges of the docking field. Enumerating over all theoretically possible configurations may result in impractical running times. Several points should be considered in flexible docking. First, which type of flexibility is included: small molecule torsional movements, fast side-chain movements, slow hinge-bending domain movements or loop movements. Second, in which molecule is the flexibility allowed: in the receptor, in the ligand, or in both. Third, how is the flexibility treated: is it treated explicitly by exploring the space of all possible conformations or just implicitly by allowing some Wolfson et al. inter-penetrations between molecules (soft docking). Finally, what is the algorithmic design: some algorithms incorporate flexibility directly in the 3D search stage whereas others in the final improvement procedure. Many algorithms treat the problem of ligand torsional flexibility. However, only few methods treat the proteinreceptor backbone or side-chain flexibility due to the problem complexity. There are several ways to handle side-chain flexibility. Randomized and evolutionary algorithms add torsion angles of side chains as parameters for optimization [84, 85, 86]. The FlexE algorithm  employs MPS (Multiple Protein Structures). The PDB contains numerous redundant structures of the same proteins or multiple conformations from NMR. Thus, rather than using one structure for docking, an ensemble of aligned molecules can be used. Molecular dynamics is another source of multiple conformations. These methods sample side-chain instances from different input protein structures and generate new protein conformations. Side-chain conformations are further obtained from rotamer libraries. In addition, the docking algorithm should account for the intra-molecular interactions of each protein conformation in the combinatorial rotamer shuffling. Hinge bending movements are observed in many proteins . This type of large scale flexibility is especially difficult for modeling by docking algorithms, since the movements include domains, secondary structure elements and loops. The only available method based on an Articulated Object Recognition algorithm from Computer Vision  can handle hinge-bent flexibility in one of the docked molecules. Movements are allowed either in the ligand or in the larger receptor. The method efficiently exploits hinge information in the search stage, exploring the space of all possible conformations while avoiding brute force search. The flexible molecule is divided into rigid parts at hinge points and all parts are docked simultaneously. Hence, the algorithm mimics the so-called “induced” molecular fit. Modeling of the complex of c-Src kinase with proline rich peptide by hingebased docking is demonstrated in case study 1. Scoring Function The output of the conformational search stage is a large number of potential docked complexes. The scoring stage is necessary in order to filter out solutions with unlikely interactions and to rank the remaining ones according to some favorable interaction criteria. Development of a reliable scoring function is one of the most challenging problems in the field . Many scoring schemes have been suggested. These are based on optimization of van der Waals interactions, hydrogen bonding, electrostatic potential, hydrophobicity, atom-atom and residue-residue knowledge-based potentials. However, native-like complexes do not necessarily have the best shape complementarity, the largest number of hydrogen bonds, or the most favorable electrostatics. A combination of scoring functions or consensus scoring is often useful since it can reduce the number of false positive results . 3.2 Reality Despite the recent progress in the field, docking algorithms still require improvements in both scoring function From Structure to Function and flexibility treatment. In the leading algorithms the “correct” near native result is found among few hundreds of top scoring results in most cases, unless large conformational changes are involved. The source of the problem is in the scoring stage, due to many false positive results. Nevertheless, docking is widely used in the search for novel drug leads. Shoichet et al.  have compared virtual ligand screening (VLS) using docking with high-throughput screening (HTS). The hit rate of docking (34.8%) was 1700fold higher than that of a random screening (0.021%). There are several advantages to docking over high throughput screening: (i) the synthesis of the compounds is not required and the size of the compound library is not limited; (ii) docking can easily identify large groups of false negative compounds: those that are too large for the receptor binding site and have totally unsuitable biochemical properties; (iii) docking can be further improved by pharmacophoric information [91, 92]. Protein-protein docking is mostly used for predicting the structure of a complex, when we already know that the molecules of interest interact. Significant progress was made in protein docking methods in the past few years. CAPRI (Critical Assessment of Predicted Interactions, http://capri. ebi. ac.uk) is a recent effort of the docking community to evaluate the performance of different protein docking methods by blind prediction. Launched in 2001, CAPRI already had 5 rounds with the total number of 16 targets and 29 groups participating in the last round. This type of assessment is especially important for evaluating state of-the-art docking algorithms. However, such an assessment does not only test the algorithms, but also the ability of the predictor to use correctly any available biological information about the targets in the docking process. Binding site knowledge proved to be very helpful in the selection of candidate results. One of the lessons from the CAPRI experiment is that the success in the prediction depends on the type of the complex . Complexes with large interface area and compact interface packing are easier to predict than with complexes with small interface area and relatively flat interface shape. Additionally, the prediction task is very difficult if there are conformational changes upon complex formation. Nevertheless, the CAPRI results are encouraging: there were correct predictions from several groups for most of the targets. The lessons learned from such experiments show that docking algorithms still need improvements in terms of flexibility, scoring functions and automated binding site prediction. 4 SYNERGISTIC APPLICATIONS OF ’UNIT OPERATIONS’ Applications of computational tools to biological problems face a challenge: how to guide the experimental research and how to reduce the number of ’blind’ trials. Integration of individual ’unit operations’ is essential. Sequence and structural alignments, database applications, homology modeling, docking of a protein to small ligands or to proteins can be synergistically combined. The case studies in this article demonstrate examples of such combinations. Other published results also indicate this trend. For example, a combination of different approaches was employed to model a suppressor of cytokine signalling 1 (SOCS-1) . Specifically, multiple sequence alignment was performed with ClustalW, Current Protein and Peptide Science, 2005, Vol. 6, No. 2 177 fold recognition was obtained with GenThreader and Threader, TopLign, 3D-PSSM and UCLA-DOE and structural alignment was performed with DALI, VAST and CE. The structure of the SOCS-1 was generated using MODELLER and docking was run with 3D_DOCK. Another example of structure-based combinatorial protein engineering, is a design of a subdomain from two distantly related proteins . 4.1 Alignment and Docking Below we present a set of case studies that illustrate the biological significance of the combination between different unit operations. Case Study 1: Simulating Intra-Domain Movements to Predict a Complex of c-Src Protein Kinase with a ProlineRich Peptide Deregulation of the activity of c-Src protein kinase may cause various human diseases, including breast and colon cancer. Therefore, inhibition of its activity is a promising strategy in anti-cancer treatment. c-Src protein kinase is composed of three domains, the catalytic domain, SH2 and SH3. Fig. 2a presents a model for SH3-mediated signalling suggested by Pisabarro et al. . According to the authors, a high affinity peptide (colored green) competes for the region of SH3 domain, which is otherwise occupied by the linker (colored blue) between the SH2 and the catalytic domains. Today there is no crystal structure of all three domains of c-Src in the presence of a peptide bound to the SH3 domain. The goal of this study is to model this complex and to computationally verify the model suggested by Pisabarro et al. (Fig. 2b) presents the differences in the spatial orientations of these domains as obtained by a superimposition of the inactive form of human c-Src (PDB code: 1fmk, blue) on the auto-inhibited form of haematopoetic kinase (Hck, PDB code: 1qcf, red). Fig. 2c presents the superimposition between the currently available unbound structure of the three kinase domains with the peptide complexed in the presence of the SH3 domain alone. Assuming that the auto-inhibited form of human c-Src (PDB code: 1fmk) may be similar to that of Hck (PDB code: 1qcf), the first step was to use the FlexProt  algorithm to perform a flexible alignment between them. Fig. 2d presents the two hinges that were detected at residues 138 and 252 of c-Src (prediction accuracy: 1.34Å RMSD, running time: 4 seconds, Pentium IV PC). The detected hinges were used as input to perform flexible docking (FlexDock [71, 88]) between the inactive structure of the three domains of c-Src and the proline-rich peptide (RLP2) (running time: 2.5 minutes, Pentium IV PC). Fig. 2e presents the predicted complex (c-Src in blue, RLP2 in green). In order to evaluate its quality, the SH3 domain from the known complex (PDB code: 1rlq, colored red) is superimposed on the SH3 domain of the predicted complex. The RMSD between the peptide from the complex and the docked peptide was 1.0Å RMSD. The SH2 and the catalytic domains are rotated by the algorithm backward from the viewing angle of (Fig. 2c), to allow the peptide binding without any steric clash with the linker. This movement can be described by a hinge-bending angle (measured from the domains mass centers to the hinge) of 26 degrees. 178 Current Protein and Peptide Science, 2005, Vol. 6, No. 2 Wolfson et al. Fig. (2). Case Study 1. (a) A model for SH3-mediated signalling suggested by Pisabarro et al.. The peptide (colored green) competes with the linker (colored blue) for the same region of the SH3 domain. (b) Rigid superimposition between two c-Src tyrosine kinase proteins: human c-Src (PDB code: 1fmk), depicted in blue and haematopoetic cell kinase (Hck, PDB code: 1qcf), represented in red. (c) A superimposition between the c-Src protein (PDB code: 1fmk), colored blue, and a proline-rich peptide RLP2, colored green, as it appears in a complex with the SH3 domain (PDB code: 1rlq). (d) The interdomain movement between three domains, SH3-SH2-Kinase, is detected by a flexible alignment algorithm . 1fmk is depicted in blue, 1qcf: domain SH3 in green, SH2 in orange and the kinase in red. (e) The result of flexible hinge-bending docking [71, 88] viewed from the back side of Figure (c). c-Src protein is blue, RLP2 surface is depicted in green. The SH3 domain from the complex (PDB code: 1rlq), superimposed on the docking result, is colored in red. (f) A scheme summarizing the combinations of the algorithms used in the example. Case Study 2: Recognition of Conserved Loops of Serine Proteases for Binding Site Focused Docking There are numerous evidences for disorders in function of serine proteases and their inhibitors in several human diseases. In this case study the information obtained by a multiple secondary structure alignment is exploited to improve the accuracy of unbound docking between a serine protease and its bovine pancreatic inhibitor. Fig. 3a shows the structural alignment between ten serine proteases (PDB codes: 2ptn, 1sgt, 1ton, 2alp, 2pkaAB, 2sga, 3est, 3rp2A, 3sgbE and 4chaA) as obtained by MASS . It is difficult to deduce this alignment based on sequence information alone [34, 97, 98]. The size of the conserved core (depicted in yellow) is 123 residues (RMSD 1.48Å, running time: 39 seconds, Pentium IV PC). As expected, this region contains the two βbarrels (six strands each) that form the fold. The catalytic site of enzymes is frequently formed by loops connecting secon- From Structure to Function dary structure elements . The recognized core contained three structurally conserved loops: residues 55-59, 128-130 and 189-197 (depicted in purple). Two of these included residues that belong to the catalytic triad (His-57, Asp-102, Ser-195). In the next step, the docking algorithm PatchDock  was applied to predict a complex of a serine protease kallikrein A with a bovine pancreatic trypsin inhibitor (PDB code: 2kai). Both the receptor and the inhibitor were taken in their unbound conformations as they appear in two separate crystal complexes (PDB codes: 2pka and 6pti respectively). The conserved loops were used by the docking algorithm as a binding region to focus the conformational search. As a result, a near native solution was ranked at the top (see Fig. 3b), prediction accuracy 4.0Å RMSD, running time: 38 seconds, Pentium IV PC. The docked inhibitor, in green, is superimposed on the inhibitor of the complex (PDB code: 2kai, chain I), depicted in red. The conserved loops are in purple. Case Study 3: Recognition of Conserved Core of PLPDependent Transferase for Focused Docking The goal of this study is the reconstruction of the complex of PLP-dependent transferase (PDB code: 1qj5) with a PLP (pyridoxal-5-phosphate) molecule. PLP is the biologically active form of vitamin B6. Low levels are correlated Current Protein and Peptide Science, 2005, Vol. 6, No. 2 179 with cardiovascular and other diseases. The first step in this study was to apply a multiple structural alignment algorithm (MultiProt ) to 17 structures from the PLP-dependent superfamily (PDB codes: 1ams, 1art, 1asf, 1ax4, 1ay4, 1bt4, 1cl1, 1gbn, 1oat, 1ord, 1qis, 1qj5, 2ay1, 2can, 2dkb, 2tpl and 4gsa). The identified common structural core (accuracy 3.0Å RMSD, running time: 2 minutes, Pentium IV PC) is depicted in (Fig. 4a) on a representative structure of PDB: 1qj5 (dark blue). Only one residue (PDB: 1qj5; Asp-245) from the computed structurally conserved core had the same amino acid identity in all the 17 aligned proteins. The recognized residue is located in the PLP binding site, which suggests its importance for binding and for function. This residue, colored yellow, and the PLP molecule, colored orange, are depicted in balls & sticks. The next step was to apply the docking algorithm (PatchDock ) that will use the location of this residue to improve its ranking by selecting only solutions in which this residue interacts with the ligand. The results of docking the PLP molecule (taken from PDB: 4gsa) to diaminopelargonic acid (PDB code: 1qj5) are depicted in (Fig. 4b). The highest ranking solution, depicted in red, is superimposed on the complex from PDB: 1qj5, in orange (prediction accuracy: 1.16Å RMSD, running time 16 seconds, Pentium IV PC). Fig. (3). Case Study 2. (a) The structural alignment between ten serine proteases (PDB codes: 2ptn, 1sgt, 1ton, 2alp, 2pkaAB, 2sga, 3est, 3rp2A, 3sgbE and 4chaA). PDB:2pka is shown completely in blue. The core of the alignment is colored in yellow and the three conserved loops are colored in purple. As one can see, two of the conserved loops are located in the active site. (b) The unbound docking  between kallikrein A (PDB code: 2pka) and its inhibitor (PDB code: 6pti). The docked inhibitor, colored in green, is superimposed on the inhibitor of the complex (PDB code: 2kai, chain I), depicted in red. The conserved loops, as obtained by a multiple structural alignment algorithm , are in purple. (c) A scheme summarizing the combination of the algorithms used in the example. 180 Current Protein and Peptide Science, 2005, Vol. 6, No. 2 Wolfson et al. Fig. (4). Case Study 3. (a) The structural core identified by multiple structural alignment  among 17 structures from the PLP-dependent superfamily (PDB codes: 1ams, 1art, 1asf, 1ax4, 1ay4, 1bt4, 1cl1, 1gbn, 1oat, 1ord, 1qis, 1qj5, 2ay1, 2can, 2dkb, 2tpl, 4gsa). In dark blue is a representative structure of 1qj5. The conserved active site residue (Asp-245, 1qj5:245), colored yellow and the PLP molecule, colored orange, are depicted as ball & stick. (b) The docking  of the PLP (taken from 4gsa) to the diaminopelargonic acid (PDB code: 1qj5). The predicted solution, depicted in red, is superimposed on the original, which is depicted in orange. (c) A scheme summarizing the combinations of the algorithms used in the example. 4.2 Docking and Homology Modeling One of the promises of structural genomics is to design drugs directly from protein sequences. If accurate, a homology modeled protein structure may be used in docking studies. Rong et al.  homology modeled the RasGRP C1 domain and docked it to ligands. Combined with experimental binding affinity measurement, their study provided detailed atomic level interactions of PKC ligands with the RasGRP receptor. The PGHS-1 and PGHS-2 structures were combined to model the human PGHS-2 and to dock ligands. The docked conformation superimposed very well on the available complexed structure . Docking can be used synergistically with modeling for structure refinement or for engineering side chains. The reliability of homology models of G-protein coupled receptors in virtual screening of chemical databases was examined by three docking programs (Dock , FlexX , Gold ) and seven scoring functions . The modeled structure was found suitable for screening antagonist compounds. However, it was not accurate enough to retrieve known agonists, due to conformational changes during agonist activation. Using a pharmacophore has led to a model refinement and successful agonist retrieval, illustrating the importance of correct modeling. While often only a low-resolution structure can be predicted, to compensate for inaccuracies, protein and ligand models may be averaged by optimizing steric and quasi-chemical complementarity between the interacting partners . When sequence identity was low, averaging out uncertainties through multiple docking has led to an improvement . Schafferhans and Klebe explored a method to dock ligands onto binding site representations derived from homology modeled proteins . The ligands were docked to a preliminary model and QSAR analysis from ligand orientations was used to refine the structure. Alignment, ligand data analysis and protein structure modeling were iterated until self-consistency was achieved. The approach has been validated using cases with available crystal structures. In an effort to engineer substrate specificity of Damino-acid oxidases, Sacchi et al.  combined homology modeling, docking and active site mutagenesis. The rational design approach successfully produced enzymatic activity with new, broader substrate specificity. Another interesting example to combination of homology modeling, docking and wet-lab experiments is the discovery of the binding site of the stress protein GRP94 to the VSV8 peptide . The complex of GRP94 with a specific peptide can dramatically stimulate T cell response. Therefore, tumorderived GRP94 is possibly a powerful immunotherapeutic tool, since it can be used to evoke immune response against the tumor. The N-terminal domain of GRP94 was homology modeled using the structure of the N-terminal domain of yeast HSP90. The modeling of the complex with the VSV8 peptide was achieved by applying PatchDock  to detect few potential binding sites. The validity of these computationally derived sites was next verified by site directed mutagenesis. Out of the two highest scoring binding sites From Structure to Function detected by PatchDock, one was proved to be correct in the wet-lab experiments. One of the most successful stories comes from the redesign of ligand-binding site specificity in proteins . The procedure combines target-ligand docking with computational mutations of receptor residues in direct contact with a wild-type ligand (typically 12-18 residues, corresponding to 1045 to 1068 mutant structures representing 1015 to 1023 sequences). Seventeen designs predicted by the automated procedure were experimentally tested for specific ligand binding. Every designed receptor exhibited a detectable affinity for its target ligand. 4.3 Folding via Docking Docking and protein folding are all too often considered to be distinct fields. Yet, protein folding can be modeled as a hierarchical process , where folding is done by selfdocking of previously assembled smaller units. Hierarchy implies a process of assembly of smaller folding entities, such as domains, hydrophobic folding units or building blocks. In protein folding, one can first attempt to model the structures of smaller subunits. The structure can then be solved by an assembly of these subunits that satisfies the backbone connectivity constraints. The CombDock method  employs geometric docking techniques to implement this assembly approach. The method is combined with the building block identification scheme of Haspel et al.  to predict protein tertiary structure. To date, predictive protein folding focuses on single domain proteins. In an early attempt, Xu et al.  used threading to model domain structure and proceeded to dock two domains to obtain the complete modeled structure. Their predictions were consistent with the experimental data. Recently, the CombDock method was applied to reconstruct multi-domain protein structures by docking of its domains. The majority of proteins function when associated in multi-molecular assemblies. Yet, prediction of the structures of multi-molecular complexes has largely not been addressed, probably due to the magnitude of the combinatorial complexity of the problem. Docking schemes can be applied to recreate multi-molecular assemblies. Eisenstein et al.  used docking to construct multi-molecular complexes that satisfy restrictive symmetries, such as in viral coat proteins. Another example is the novel application of CombDock  for predicting the structures of large macromolecular assemblies. The algorithm yielded good predictions of quaternary structures of oligomers and multi-protein complexes, even for a structurally distorted input. 5 CONCLUSIONS AND FURTHER APPLICATIONS A combination of homology modeling, database analysis, sequence/structure alignment, molecular docking, and experimental verification has shown its powerful potential in biochemical research. Structural analysis has an advantage over sequence analysis alone, since structures contain additional information. On the other hand, it presents difficulties, as we need to consider protein flexibility, surface variability, chemical interactions and resolution. Current Protein and Peptide Science, 2005, Vol. 6, No. 2 181 This review summarizes the main algorithmic approaches of most of the routinely used structural tools. Different methods address different biological problems. Nevertheless, they do share many methodological similarities. 3D pattern detection algorithms like the Geometric Hashing, clique detection, subgraph-isomorphism, genetic algorithms, Monte-Carlo search are commonly used in structural alignment as well as in docking and in other applications. Some algorithms are specific to proteins. Others can address various molecular types, including RNA and drugs. Scoring functions are the current bottleneck. This might possibly be explained by the insufficient understanding of molecular interactions. Significant advances in modeling interactions will lead to progress in folding, docking and alignment of protein binding sites. Modeling of the flexibility is yet another major challenge both in docking and in structural comparison. Computational tools for sequence analysis have become a common practice in biological experiments. Due to the extremely fast progress in the Structural Genomics projects, structural analysis tools are increasingly and steadily turning into an integral part of practically every aspect of biological research. What is the future outlook in this field? The next major problem we are facing is combining all methods, and in particular devising new ones to construct the map of molecular interactions. Molecular interactions hold the key. However, our goal is not only to put these together to obtain the cellular pathways. We need to figure out how these are regulated. The data are very noisy; this is inevitable when dealing with a problem of such huge dimensions. The difficulty is how to fish out the essential information toward this goal. To handle the vast amount of data, extremely efficient computational strategies are needed. Again, experimental verification will be crucial to test the computational predictions. This is the future mission of Computational Biology, leading toward the goal of personalized medicine. ACKNOWLEDGEMENTS We thank our Structural Bioinformatics Group in Tel Aviv and in Frederick. The research of R. Nussinov and H. J. Wolfson in Israel has been supported in part by the “Center of Excellence in Geometric Computing and its Applications” funded by the Israel Science Foundation (administered by the Israel Academy of Sciences) and by the Tel Aviv University Adams Brain Center. The research of H.J.W. is partially supported by the Hermann Minkowski-Minerva Center for Geometry at Tel Aviv University. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, under contract number NO1-CO-12400. The content of this publication does not necessarily reflect the view or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organization imply endorsement by the U.S. Government. REFERENCES    Gene Ontology Consortium, Nucl. Acids. Res. 2004, 32, 258-261. Koehl, P. Curr. Opin. Struct. Biol. 2001, 11, 348-353. Eidhammer, I.; Jonassen, I.; Taylor, W. R. J. Comp. Biol. 2001, 7, 685-716. 182                                         Current Protein and Peptide Science, 2005, Vol. 6, No. 2 Goldsmith-Fischman, S.; Honig, B. Protein Sci. 2003, 12(9), 18131821. Dietmann, S.; Park, J.; Notredame, C.; Heger, A.; Lappe, M.; Holm, L. Nucleic Acids Res. 2001, 29, 55-57. http://www2.ebi. ac.uk/dali/. Sauder, J.; Arthur, J.; Dunbrack Jr, R. Proteins 2000, 40, 6-22. D’Alfonso, G.; Tramontano, A.; Lahm, A. J. Struct. Biol. 2001, 134, 246-256. Russell, R. B.; Sasieni, P. D.; Sternberg, M. J. J. Mol. Biol. 1998, 282, 903-18. Needleman, S.; Wunsch, C. J. Mol. Biol. 1970, 48, 443-453. Nalefski, E.; Falke, J. Protein Sci. 1996, 5, 2375-2390. Holm, L.; Sander, C. J. Mol. Biol. 1993, 233(1), 123-138. Murzin, A.; Brenner, S.; Hubbard, T.; Chothia, C. J. Mol. Biol. 1995, 247, 536-540. Pearl, F.; Bennett, C.; Bray, J.; Harrison, A.; Martin, N.; Shepherd, A.; Sillitoe, I.; Thornton, J.; Orengo, C. Nucl. Acids Res. 2003, 31, 452-455. Holm, L.; Sander, C. Nucl. Acids Res. 1996, 24, 206-209. Hadley, C.; Jones, D. Structure 1999, 7, 1099-1112. Getz, G.; Vendruscolo, M.; Sachs, D.; Domany, E. Proteins 2002, 46, 405-415. Shindyalov, I.; Bourne, P. Protein Eng. 1998, 11(9), 739-747. http://cl.sdsc.edu/ce.html. Bachar, O.; Fischer, D.; Nussinov, R.; Wolfson, H. J. Protein Eng. 1993, 6(3), 279-288 http://bioinfo3d.cs.tau.ac.il/c_alpha_match/. Madej, T.; Gibrat, J.; Bryant, S. Proteins 1995, 23, 356-369. http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml. Gerstein, M.; Krebs, W. Nucl. Acids Res. 1998, 26, http://bioinfo. mbb.yale.edu/MolMovDB/. Lee, R.; Razaz, M.; Hayward, S. Bioinformatics 2003, 19, 12901291. Wriggers, W.; Schulten, K. Proteins 1997, 29(1), 1-14. Boutonnet, N.; Rooman, M.; Wodak, S. J. Mol. Biol. 1995, 253, 633-647. Ochagavia, M. E.; Richelle, J.; Wodak, S. J. Bioinformatics 2002, 18, 637-640. Verbitsky, G.; Nussinov, R.; Wolfson, H. J. Proteins 1999, 34, 232–254. Shatsky, M.; Fligelman, Z.; Nussinov, R.; Wolfson, H. J. Alignment of flexible protein structures. In 8th International Conference on Intelligent Systems for Molecular Biology; The AAAI press: 2000. Carugo, O. Proteins 2001, 42, 390-398. Jacobs, D.; Rader, A.; Kuhn, L.; Thorpe, M. Proteins 2001, 44, 150-165. Shatsky, M.; Nussinov, R.; Wolfson, H. J. Proteins 2002, 48, 242256. http://bioinfo3d.cs.tau.ac.il/FlexProt. Nicholas, H. B. J.; Ropelewski, A. J.; Deerfield, D. W. N. Biotechniques 2002, 32(3), 572-591. Gerstein, M.; Levitt, M. Using iterative dynamic programming to obtain accurate pairwise and multiple alignments of protein structures. In Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology; The AAAI press: 1996. Akutsu, T.; Sim, K. Genome Inform. 1999, 10, 23-29. Russell, R.; Barton, G. Proteins 1992, 14, 309-323. Yang, A.; Honig, B. J. Mol. Biol. 2000, 301, 691-711. Taylor, W.; Flores, T.; Orengo, C. Protein Sci. 1994, 3, 1858-1870. Escalier, V.; Pothier, J.; Soldano, H.; Viari, A. J. Comp. Biol. 1988, 5, 41-56. Leibowitz, N.; Nussinov, R.; Wolfson, H. J. J. Comp. Biol. 2001, 8, 93-121. Shatsky, M.; Nussinov, R.; Wolfson, H. J. MultiProt - a multiple protein structural alignment algorithm. In Workshop on Algorithms in Bioinformatics, Vol. 2452; Guigo, R.; Gusfield, D., Eds.; LNCS, Springer Verlag: 2002. Dror, O.; Benyamini, H.; Nussinov, R.; Wolfson, H. Protein Sci. 2003, 12, 2492-2507. Akutsu, T.; Halldorson, M. M. Theoretical Computer Science 2000, 233, 33-50. Ma, B.; Elkayam, T.; Wolfson, H. J.; Nussinov, R. Proc. Natl. Acad. Sci. USA 2003, 100, 5772-5777. Guda, C.; Scheeff, E.; Bourne, P.; Shindyalov, I. A new algorithm for the alignment of multiple protein structures using Monte Carlo optimization. In Proceedings of the Pacific Symposium on Biocomputing; 2001. O’Hearn, S.; Kusalik, A.; Angel, J. Protein Eng. 2003, 16, 169-178. Wolfson et al.                                          Nagano, N.; Orengo, C. A.; Thornton, J. M. J. Mol. Biol. 2002, 321, 741-765. Kuttner, Y. Y.; Sobolev, V.; Raskind, A.; Edelman, M. Proteins 2003, 52, 400-411. Denessiouk, K. A.; Rantanen, V.; Johnson, M. Proteins 2001, 44, 282-291. Phillips, C. L.; Ullman, B.; Brennan, R. G.; Hill, C. P. EMBO J. 1999, 18, 3533-3545. Wallace, A. C.; Laskowski, R. A.; Thornton, J. M. Protein Sci. 1996, 5, 1001-1013. Wallace, A. C.; Borkakoti, N.; Thornton, J. M. Protein Sci. 1997, 6, 2308-2323. Barker, J. A.; Thornton, J. M. Bioinformatics 2003, 19, 1644-1649. Artymiuk, P. J.; Poirrette, A. R.; Grindley, H. M.; Rice, D. W.; Willett, P. J. Mol. Biol. 1994, 243, 327-344. Spriggs, R. V.; Artymiuk, P. J.; Willett, P. J. Chem. Inf. Comput. Sci. 2003, 43, 412-421. Binkowski, T.; Adamian, L.; Liang, J. J. Mol. Biol. 2003, 232, 505526. Moodie, S. L.; Mitchell, J. B. O.; Thornton, J. M. J. Mol. Biol. 1996, 263, 486-500. Rosen, M.; Lin, S.; Wolfson, H. J.; Nussinov, R. Protein Eng. 1998, 11, 263-277. Schmitt, S.; Kuhn, D.; Klebe, G. J. Mol. Biol. 2002, 323, 387-406. Kinoshita, K.; Nakamura, H. Protein Sci. 2003, 12, 1589-1595. Brakoulias, A.; Jackson, R. M. Proteins 2004, 56, 250-60. Shulman-Peleg, A.; Nussinov, R.; Wolfson, H. J. J. Mol. Biol. 2004, 339(3), 607-633 http://bioinfo3d.cs.tau.ac.il/SiteEngine. Jones, S.; Thornton, J. M. Curr. Opin. Chem. Biol. 2004, 8, 3-7. Schneider, G.; Böhm, H. Drug Discov. Today 2002, 7, 64-70. Keseru, G. J. Comput. Aided Mol. Des. 2001, 15, 649-657. Schneidman-Duhovny, D.; Nussinov, R.; Wolfson, H.J. Curr. Med. Chem. 2004, 11, 91-107. Sternberg, M.; Gabb, H.; Jackson, R. Curr. Opin. Struct. Biol. 2002, 12, 28-35. Halperin, I.; Ma, B.; Wolfson, H. J.; Nussinov, R. Proteins 2002, 47, 409–443. Abagyan, R.; Totrov, M. Curr. Opin. Chem. Biol. 2001, 5, 375-382. Ewing, T.; Makino, S.; Skillman, A.; Kuntz, I. J. Computer-Aided Mol. Des. 2001, 15, 411-428. Rarey, M.; Kramer, B.; T., L. Time-Efficient Docking of Flexible Ligands into Active Sites of Proteins. In 3’rd Int. Conf. on Intelligent Systems for Molecular Biology (ISMB’95); AAAI Press: Cambridge, UK, 1995. Norel, R.; Petrey, D.; Wolfson, H. J.; Nussinov, R. Proteins 1999, 35, 403–419. Goldman, B.; Wipke, W. Proteins 2000, 38, 79–94. Schneidman-Duhovny, D.; Inbar, Y.; Polak, V.; Shatsky, M.; Halperin, I.; Benyamini, H.; Barzilai, A.; Dror, O.; Haspel, N.; Nussinov, R.; Wolfson, H. J. Proteins 2003, 52, 107-112. Duhovny, D.; Nussinov, R.; Wolfson, H. J. Efficient unbound docking of rigid molecules. In Workshop on Algorithms in Bioinformatics, Vol. 2452; Guigo, R.; Gusfield, D., Eds.; LNCS, Springer Verlag: 2002 http://bioinfo3d.cs.tau.ac.il/PatchDock. Abagyan, R.; Totrov, M.; Kuznetsov, D. J. Comput. Chem. 1994, 15(5), 488-506. Österberg, F.; Morris, G.; Sanner, M.; Olson, A.; Goodsell, D. Proteins 2002, 46, 34–40. Jones, G.; Willet, P.; Glen, R.; Leach, A. J. Mol. Biol. 1997, 267, 727–748. Gardiner, E.; Willett, P.; Artymiuk, P. Proteins 2001, 44, 44–56. Heifetz, A.; Katchalski-Katzir, E.; Eisenstein, M. Protein Sci. 2002, 11, 571-587. Chen, R.; Weng, Z. Proteins 2002, 47, 281–294. Gabb, H.; Jackson, R.; Sternberg, M. J. Mol. Biol. 1997, 272, 106120. Mandell, J.; Roberts, V.; Pique, M.; Kotlovyi, V.; Mitchell, J.; Nelson, E.; Tsigelny, I.; Ten Eyck, L. Protein Eng. 2001, 14, 105–113. Vakser, I. Protein Eng. 1995, 8, 371–377. Ritchie, D.; Kemp, G. Proteins 2000, 39, 178–194. Katchalski-Katzir, E.; Shariv, I.; Eisenstein, M.; Friesem, A.; Aflalo, C.; Vakser, I. Proc. Natl. Acad. Sci. USA 1992, 89, 21952199. Jones, G.; Willet, P.; Glen, R. J. Mol. Biol. 1995, 245, 43-53. From Structure to Function                 Fernandez-Recio, J.; Totrov, M.; Abagyan, R. Protein Sci. 2002, 11, 280-291. Schnecke, V.; Swanson, C.; Getzoff, E.; Tainer, J.; Kuhn, L. Proteins 1998, 33, 74-87. Clauβen, H.; Buning, C.; Rarey, M.; Lengauer, T. J. Mol. Biol. 2001, 308, 377-395. Sandak, B.; Nussinov, R.; Wolfson, H. J. J. Comp. Biol. 1998, 5(4), 631-654. Charifson, P.; Corkery, J.; Murcko, M.; Walters, W. J. Med. Chem. 1999, 42, 5100–5109. Doman, T.; McGovern, S.; Witherbee, B.; Kasten, T.; Kurumbail, R.; Stallings, W.; Connolly, D.; Shoichet, B. J. Med. Chem. 2002, 45, 2213–21. Hindle, S.; Rarey, M.; Buning, C.; Lengauer, T. J. Computer-Aided Mol. Des. 2002, 16, 129-149. Dror, O.; Shulman-Peleg, A.; Nussinov, R.; Wolfson, H. J. Curr. Med. Chem. 2004, 11, 71-90. Vajda, S.; Camacho, C. Trends Biotechnol. 2004, 22, 110-6. Giordanetto, F.; Kroemer, R. Protein Eng. 2003, 16, 115-124. O’Maille, P.; Bakhtina, M.; Tsai, M. J. Mol. Biol. 2002, 321, 677691. Pisabarro, M. T.; Serrano, L.; Wilmanns, M. J. Mol. Biol. 1998, 281, 513-521. Russell, R.; Barton, G. Proteins 1992, 14, 309-323. Sali, A.; Blundell, T. J. Mol. Biol. 1990, 212, 403-428. Branden, C.; Tooze, J. Introduction to Protein Structure, Second Edition; Garland Publishing, Inc.: New York, 1999. Rong, S.; Enyedy, I.; Qiao, L.; Zhao, L.; Ma, D.; Pearce, L.; Lorenzo, P.; Stone, J.; Blumberg, P.; Wang, S.; Kozikowski, A. J. Med. Chem. 2002, 45, 853-860. Current Protein and Peptide Science, 2005, Vol. 6, No. 2                 183 Pouplana, R.; Lozano, J.; Ruiz, J. J.Mol. Graph. Model 2002, 20, 329-343. Bissantz, C.; Bernard, P.; Hibert, M.; Rognan, D. Proteins 2003, 50, 5-25. Wojciechowski, M.; Skolnick, J. J. Comput. Chem. 2003, 23, 189197. Mazitsos, C.; Rigden, D.; Tsoungas, P.; Clonis, Y. Eur. J. Biochem. 2002, 269, 5391-5405. Schafferhans, A.; Klebe, G. J. Mol. Biol. 2001, 307, 407-427. Sacchi, S.; Lorenzi, S.; Molla, G.; Pilone, M.; Rossetti, C.; Pollegioni, L. J. Biol. Chem. 2002, 277, 27510-27516. Gidalevitz, T.; Biswas, C.; Ding, H.; Schneidman-Duhovny, D.; Wolfson, H.; Stevens, F.; Radford, S.; Argon, Y. J. Biol. Chem. 2004, 279(16), 16543-52. Looger, L.; Dwyer, M.; Smith, J.; Hellinga, H. Nature 2003, 423, 185-190. Baldwin, R.; Rose, G. Trends Biochem. Sci. 1999, 24, 26-33. Inbar, Y.; Benyamini, H.; Nussinov, R.; Wolfson, H.J. Bioinformatics 2003, 19 Suppl. 1, i158-i168. Haspel, N.; Tsai, C.; Wolfson, H.; Nussinov, R. Protein Sci. 2003, 12, 1177–1187. Xu, D.; Baburaj, K.; Peterson, C.; Xu, Y. Proteins 2001, 44, 312320. Berchanski, A.; Eisenstein, M. Proteins 2003, 53, 817-829. Inbar, Y.; Benyamini, H.; Nussinov, R.; Wolfson, H. 2004, submitted. Baker, D.; Sali, A. Science 2001, 294, 93-96. Jaramillo, A.; Wernisch, L.; Hery, S.; Wodak, S. J. Comb Chem High Throughput Screen 2001, 4, 643-659.