Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Discovery and development of integrase inhibitors wikipedia , lookup
Discovery and development of antiandrogens wikipedia , lookup
Drug interaction wikipedia , lookup
Magnesium transporter wikipedia , lookup
DNA-encoded chemical library wikipedia , lookup
Neuropharmacology wikipedia , lookup
Neuropsychopharmacology wikipedia , lookup
Pharmacogenomics wikipedia , lookup
The Pharmacogenomics Journal (2001) 1, 38–47 2001 Nature Publishing Group All rights reserved 1470-269X/01 $15.00 www.nature.com/tpj REVIEW Site-specific molecular design and its relevance to pharmacogenomics and chemical biology D Bailey E Zanders P Dean De Novo Pharmaceuticals, Cambridge CB2 3DD, UK Correspondence: David Bailey, De Novo Pharmaceuticals, St Andrew’s House, 59 St Andrew’s Street, Cambridge CB2 3DD, UK Telephone: +44 1223 488888 Fax: +44 1223 488899 E-mail: david.bailey얀denovopharma.com ABSTRACT The emergence of the new discipline of pharmacogenomics reflects the growing convergence of chemical and genomic space. The massive information-driven growth in both computational chemistry and structural biology is leading to unprecedented opportunities in both chemical and biological design. In this paper we relate current opinion in structural biology to recent developments in computational drug design. Sequence information now permits protein structure prediction and, together with experimental protein structure determination, a complete database of ligand-binding sites and protein–protein interactions can be assembled. When aligned with site exploration and virtual screening, this information provides a foundation for structure-based pharmacogenomics. In association with chemical genomics, structure-based design will allow major new insights into a compound’s biological and pharmaceutical properties. The Pharmacogenomics Journal (2001) 1, 38–47. Keywords: pharmacogenomics; drug design; structural genomics; computational chemistry; chemical genomics Received: 14 February 2001 Accepted: 24 February 2001 INTRODUCTION Genomics programmes are in the process of defining the entire gamut of protein targets available for drug discovery, and the ligand-binding domains that they encode,1 while chemoinformatics2 is providing an architecture in which to handle corresponding chemical information. The developing interface between these two areas, known colloquially as pharmacogenomics,3 promises to provide a global platform through which to rationalise drug discovery at the molecular and cellular levels. A comprehensive pharmacogenomics platform has several elements, some of which are illustrated in Figure 1. In its simplest iteration, the objective of implementing such a platform is to produce a functional, small molecule ligand for every protein, or ligand-binding domain within a protein, encoded by the genome. If the protein is a therapeutic target, then the small molecule, if appropriately designed, may become a drug. If the protein’s utility is not immediately apparent, then the small molecule may become a chemical probe for function. In this paper we highlight the shift in emphasis away from genomics and proteomics as purely molecular biology disciplines, and examine their impact on the developing field of computational pharmacogenomics. The genetic basis of drug response and the direct observation of the effects of a compound on biological systems, has been recently dubbed ‘chemical genetics’.4 In this article, we use the term ‘chemical genomics’ to emphasise the contribution of genomics-based technologies to an understanding of the detailed molecular interactions between specific proteins and their small molecule ligands. Pharmacogenomics and molecular design D Bailey et al 39 Figure 1 The flow of pharmaceutical research from genomic information to population and personalised chemotherapies. THE CONTRIBUTION OF GENE AND PROTEIN SEQUENCE INFORMATION The force that has revolutionised the way we look at drug action is the Human Genome Project,5 and its progeny, global initiatives in structural biology.6 Once the complete set of expressed human genes and proteins is available, we will have at our disposal a powerful technology platform from which to survey the landscape of structural and functional biology. Such programmes will also define the complete set of tractable drug discovery targets.7 As a foundation for such global initiatives, new bioinformatics approaches to classify genes and their resulting proteins will be required. Already, sophisticated multiple sequence alignment techniques, supplemented by protein homology modelling approaches, have led to the definition of specific structural ‘motifs’ through which to identify and classify new members of related gene families. A summary and extensive review of new ‘knowledge bases’ in this area have recently been published.8,9 SUPRAMOLECULAR ASSEMBLIES ANALYSED BY PROTEOMICS While the importance of quaternary structure in protein function has been recognised for many years, recent attention has become focused on higher order structures of multiple protein complexes. This has been encouraged by the need to define the components of functional assemblies such as signalling pathways, and to infer the biological role of all the proteins with which they are associated. Analysis of protein complexes by traditional biochemical methods has been a successful process but is inefficient. An improvement in throughput has been gained by the use of genetic screening techniques, such as the yeast two-hybrid system to characterise interacting proteins.10 The genetic basis of this system makes it particularly amenable to highthroughput analysis, and the results of studies are emerging in which whole genome analyses of protein–protein interactions are possible.11,12 It should be noted, however, that the number of permutations of ‘bait’ and ‘prey’ (as interacting proteins are termed in this system) rapidly becomes unmanageable as the genome size increases. Newman et al13 have attempted to overcome this in the case of yeast (with a proteome size of 6000) by selecting for study only the subset of proteins (one out of eleven) that contain coil–coil motifs, selected as more likely to form homotypic and heterotypic interactions. Despite these advances, the nature of the twohybrid system makes it prone to false positives and negatives. In the former case, this may be due to non-specific interactions between bait and prey, and in the latter, to the inability to detect interactions caused by post-translational modification, eg phosphorylation. More recent techniques of protein identification using mass spectrometry of constituent interacting peptides have overcome some of these problems by using native protein complexes.14 The detection of protein interactions that result from phosphorylation, (eg through activation of signalling pathways by growth factor-receptor interactions) is possible by applying secondary selection techniques such as affinity purification of complexes by antibodies to phosphotyrosine and mass spectrometry of the proteins after resolution by electrophoresis. This is well illustrated by the work of Pandey et al15 who have identified a novel target of EGF receptor signalling using this approach. A new generation of genomic databases is being developed to store such information on protein–protein interactions. Apart from those available commercially which concentrate on yeast two hybrid data (for example the databases being developed by Curagen and Hybrigenics), the recently described Biomolecular Interaction Network Database (BIND)16 anticipates the influx of data describing different interactions between proteins and other molecules to rapidly highlight networks of biological or pharmaceutical interest. ACCELERATING THE PACE OF STRUCTURE DETERMINATION Figure 2 illustrates an intriguing fact: the number of new structures deposited into the public PDB database has fallen off between 1999 and 2000 despite the emergence of structural proteomics as the key area underpinning drug design. This might in part be due to the increased commercial activity in this field, with the formation of companies dedicated to high-throughput determination of proprietary protein structures for application in pharmaceutical discovery. Like other genomics-based activities, these are expensive and require large investments of capital, or the formation of consortia of interested parties from the public or private sectors. Also, the technical issues for the most highly used method of structure determination, X ray crystallography, remain formidable. However, there have been improvements in each stage of the process, from protein purification and crystallisation to X-ray bombardment and data www.nature.com/tpj Pharmacogenomics and molecular design D Bailey et al 40 Figure 2 PDB entries by year from 1990 to 2000. reduction.6 The use of dedicated synchrotron sources also allows a significant increase in data collection using smaller crystals, although obtaining the latter is still a major ratelimiting step in the process. The selection of targets for structural biology programmes is currently biased towards those which are easy to purify and crystallise, at the expense of more challenging targets such as membrane proteins, which comprise a vast family of considerable interest as drug targets. Nevertheless, the recent structure determination of rhodopsin17 has shown that it is possible to determine the structures of membranebound targets, and the solution of the structures of related proteins of pharmaceutical relevance, such as G-protein coupled receptors (GPCRs), must soon become routine. However, X-ray structures of proteins only provide a single conformational snapshot of the protein in a highly restricted crystal environment. NMR determinations in solution are more representative of the variety of conformations expected to be available for ligand interaction within purified proteins, although they too probably reflect a relatively small subset of the protein structures adopted in biological systems. Far from being a neat collection of fully folded proteins, the cellular proteome is now thought to comprise a wide range of partially-folded intermediates existing in dynamic equilibrium. Many of the proteins in the cell appear to be unfolded most of the time.18 Folding has been pictured as a journey through an ensemble of partially folded states each of which is characterised by specific reaction co-ordinates or order parameters. In fact, a relatively unstructured protein molecule can have a greater capture radius for a specific binding site than the folded state with its restricted conformational freedom.19 Within biological systems, the dynamic nature of protein structures must have a major impact on the way in which both small and large ligands might interact with a target protein. HOMOLOGY MODELLING: THE WAY FORWARD? X-ray crystallography is a slow process compared with gene sequencing, and NMR is limited by the size of protein that can be examined effectively. The only efficient way to process sequence data is by large-scale homology modelling. Sanchez & Sali, in an important paper, have attempted the The Pharmacogenomics Journal global creation of protein structure models from the yeast genome.20 They used 1071 sequences and employed ALIGN and MODELLER to create homology models automatically. This procedure provides clues to the function of the proteins by identifying the folds and 3-D motifs known to bind to specific ligands eg the identification of SH3 domains. Steady progress is being made, with perhaps 3D-computational models covering half the human proteome becoming available by 2003.21 It has recently been shown that GPCRs dimerize and heterodimerize. Gouldson et al22 have studied this in detail in the absence of ligand and in the presence of an agonist or inverse agonist using a combination of computational techniques; these were molecular dynamics, correlated mutation analysis, and evolutionary trace analysis. The evidence points to transmembrane helices 5 and 6 forming the primary interface for dimerization. A second site for dimerization was identified by evolutionary trace analysis on helices 2 and 3. These principal functional sites appear to govern domain swapping in subfamilies of GPCRs and may extend molecular modelling from isolated proteins to chimeric assemblies. Homology modelling has its limitations, however. A crystal structure of a near homologue has to be available, and the unpredictable nature of any errors in the model has to be recognised. Despite these caveats, computational prediction shows steadily increasing accuracy when compared with corresponding crystal data.23 Every protein contains surfaces that interact both with other elements of the same protein and/or with ligand molecules. Can function be inferred from structure alone? Although a comprehensive virtual analysis of the human proteome is not feasible at present, rules for such global predictions are being generated, based on structural data accumulated from specific examples.24 Despite the large number of proteins of varied function, there are limited numbers of protein families and folds. To complicate matters, many folds are conserved between proteins of widely differing function, and conversely, proteins with an identical fold pattern can perform multiple biochemical functions. Thus there are currently no hard-and-fast rules for prediction of function from folds alone. The hope is that this will become feasible as structural databases reach critical mass. From the perspective of drug discovery, however, it is already possible to relate individual folds and motifs to binding sites for specific ligands and substrates. For example, the essential features of the aspartyl protease active site are represented by a configuration of just eight atoms that are preserved in every member of that family.24 These observations define a starting point for the next phase of pharmacogenomics: the complete definition of the molecular interactions between target proteins and their cognate ligands. THE PRESENCE OF LIGANDS The ‘nucleating’ effect of specific ligands in stabilising particular protein structures can be seen in the substantial reor- Pharmacogenomics and molecular design D Bailey et al 41 ganisation in protein structure observed between some apoenzymes and their corresponding liganded structures (eg HIV protease +/− inhibitors).25 The experimental determination of a sufficient number of representative liganded structures to form a data set, for example in ReLiBase,26 is a natural extension of the structural biology initiatives referred to above. This database has been developed for use with a variety of structural analysis and query tools to focus attention on the molecular rules governing protein–ligand interactions. Site plasticity has a marked impact on ligand design within sites. A conformationally labile site, when approached by one inhibitor may adopt a subtly different, and possibly less energetically favourable structure than when addressed with a different ligand. All these factors affect de novo design. They also have important consequences for design to sites that differ as a consequence of SNP mutation. LIGAND-BINDING SITES Site Discovery by Computational Methods Protein sequence searching using bioinformatic methods such as PROSITE,27 can be used to identify characteristic ligand-binding sites by local sequence correlation with template motifs. However, sites can only be unambiguously identified by structural determination. Bioinformatics tools using sequence alone lack the ability to search for 3-dimensional structural information within complex site structures where significant folding creates the site architecture. Ligand molecules are generally much smaller than the proteins to which they bind. The interaction energy between the protein and its ligand has to be sufficiently strong for the complex to form and to exist for a significant time. Van der Waals interactions can be increased by maximising the contact area between the ligand and the site on the protein surface, and this can be achieved by docking the ligand into a cavity or cleft. Therefore algorithms that search for surface cavities can be used to identify putative binding sites. Once these cavities are found, local site maps can be computed to identify important molecular determinants of ligand binding, such as hydrogen bonding and hydrophobic site points.28 Defining Sites Biochemically Traditional methods of defining sites on proteins employ small molecule probes that have been labelled with radioactivity or fluorescence. These approaches have recently been automated using microarray formats.29 However, introducing radioisotopes and chemical modifications within ligands may affect binding, and this has spurred the development of direct biophysical methods for site definition. Differential scanning microcalorimetry is one technique that has been used to probe protein structure for some time (for a recent application see Zhang et al30). The method has been converted into a screening format to detect the perturbation of proteins resulting from small molecule binding.31 Transient changes in refractive index caused by binding of molecules to protein surfaces has been exploited in the Biacore system, where surface plasmon resonance (SPR) is used to probe similar molecular interactions. Instrumentation is also commercially available for detecting protein– ligand interactions in a solid phase format (eg the SELDI ‘chip’ from Ciphergen). SPR methods for solid phase binding assays are reviewed in Weinberger et al.32 A completely different approach for detecting and characterising small molecules bound to protein targets is mass spectrometry. Methods for binding small molecules to proteins which are immobilised on resins have been described in which the input compound mixture and the unbound components can be characterised by mass spectrometry. The difference in mass spectrum between the two allows identification of the bound components.33 Ideally one would like to rank the binding affinities of a number of compounds; the frontal affinity chromatography system described by Schreimer is designed to achieve this by characterising the molecules that elute at different rates from the protein matrix using in-line mass spectrometry.34 In summary, we can expect these techniques to elucidate the structural features characteristic of protein–protein and protein–small molecule interactions, and to provide a fundamental data set from which to extrapolate key descriptors. Site Mapping in Silico If the complete structure of the target protein is known, it is straightforward to map the site for both de novo drug design and for docking. Hydrogen-bond maps (‘site points’) can be created on the surface of the site using a variety of algorithms.35 Ligand design can be directed to appropriately chosen sets of site points with complementary hydrogen bonds built into appropriate scaffolds at optimal positions to add to the interaction energy. The case of hydrophobic interactions is less well understood because of the entropic components involved in interactions in solution. Hydrophobic interactions are determined by summation of a number of component small interactions, thus hydrophobic regions rather than points are defined. A typical site may contain 30 site points, although small drug-like molecules usually contain only a small number of corresponding ligand points. This fact has important consequences for drug design; it creates a combinatorial choice of subsets of site points that could be used to design novel scaffolds with the potential to fit the site. For example, taking five site points at a time, there are 140 000 possible subsets of site points available for design.36 A single ligand may exhibit promiscuous binding modes due to similarities in the spatial positioning of some subsets; in effect the small molecule has too many good binding choices. The identification of the subsets of site points where the choice is minimized is therefore crucial for drug design. CONFORMATIONAL FLEXIBILITY OF INDIVIDUAL SITES NMR studies show that target sites can adopt many different conformations. This plasticity raises the question: how effective is drug design closely tailored to a single conformation of the site? Plasticity can be observed in two forms. Firstly, www.nature.com/tpj Pharmacogenomics and molecular design D Bailey et al 42 large-scale shift in the protein backbone atoms may be encountered. Secondly, small changes in the equilibrium structure of the protein may affect the conformations of the residues rather than perturbing the protein backbone. Most de novo design algorithms have been based on a single fixed structure for the site. Large-scale domain movements would have to be handled by using numerous models of the site as inputs for design. However, where flexibility in the site is encountered, building flexibility into the ligand design that can match the flexibility of the site offers a way round the problem. Site Exploration 1 If the objective is to find at least one functional small molecule ligand for every protein, and the human proteome contains at least 50 000 proteins,5 then the scale of the task becomes immediately apparent. Highly-parallel screens at high throughput are clearly necessary, but at what cost? It is widely acknowledged in the pharmaceutical industry that high throughput screening (HTS) has not met the early expectations that many leads would be found rapidly.37 There are sound theoretical reasons why hopes for HTS have not been born out: 쐌 Chemical space for drug-like molecules is vast: greater than 1060 drug-like compounds could exist in principle. 쐌 The number of different structures made to date is only 107–108 and most of the molecules are based on conserved parental structures. 쐌 The amount of structural diversity available is limited. 쐌 Ligand binding is often sterically very specific. Tight binding pockets impose severe restrictions on the size of molecular fragments that can be fitted within them. In-house compound collections commonly range from 50 000 to 1 000 000 different compounds. Clearly this number is minuscule compared with the potential size of the set of drug-like molecules. One way of increasing diversity is through in silico screening of large libraries (real and virtual). Virtual screening also has the advantage of addressing the other points highlighted above. Virtual screening methods attempt to overcome some of the deficiencies in the HTS procedure by dramatically increasing the size of the screening set. This extension also enables large numbers of virtual molecules (existing only in silico) to be screened as well. Virtual screening is conceptually achieved in two ways, either by computational docking into a 3-dimensional representation of the site, or by similarity with a known pharmacophore of an active molecule where site data are unavailable. Virtual screening offers large financial savings by cutting down the number of compounds to be screened in vitro since only those compounds that show acceptable docking results need be screened in the laboratory. Docking algorithms require a set of coordinates that may be derived from crystallographic studies, NMR or homology models of the binding site. Three approaches are possible: rigid docking, flexible ligand docking and a combination of flexible site with flexible ligand docking. These procedures The Pharmacogenomics Journal have been discussed in detail elsewhere.38 Flexible ligand docking is currently feasible for a million compound libraries and is currently receiving the most attention. Docking algorithms are highly dependent on the scoring function, a statistical measure of the interaction energy between the ligand and the site. These parameters are derived from ligand co-crystal data and binding measurements. In a sense, a universal scoring function is the ‘holy grail’ of automated computational drug design. Perhaps more realistically it will be possible to develop scoring functions for separate protein classes, thus facilitating a more targeted approach to ligand docking. Site Exploration 2 We have discussed the fitting of existing (synthesized or virtual) molecules into sites using the docking techniques. We now consider specific design methods that are capable of creating molecules de novo. Drug binding to specific sites is usually reversible and achieved through non-covalent interactions such as hydrogen bonding between site and ligand, hydrophobic interactions, electrostatic interactions, steric interactions and those mediated by water molecules in the site. If the site is an enzyme, there may be more specific interactions such as those between the ligand and metal ions in the site, or interactions between cations and the clouds of aromatic rings. All of these features need to be incorporated into scoring functions for de novo design. The scoring functions for drug design and docking show many similarities, the aim being to prioritise a list of structures that can be compared for further assessment as candidates for synthesis. A commonly used approach is to identify an initial inhibitor through screening, fit it into the site computationally, and make modifications to the structure in order to increase the interaction energy. Successful modification is often confirmed by X-ray crystallography. The following example of rhinovirus protease illustrates this. Rhinoviruses are causative agents of the common cold. Maturation of the virus involves cleavage of a precursor polyprotein by rhinovirus 3C protease therefore inactivation of the protease provides a target for therapeutic intervention. The enzyme is a cysteine protease with important residues Cys147, His40 and Glu71 and attacks the Gln-Gly junction in small peptides. However there are strong similarities between the active site and that of trypsin-like serine proteases, enabling Matthews et al to use a combination of approaches to the development of strong inhibitors.39 This was achieved by modifications of the N-terminus of small peptides by adding an aldehyde functionality (structure [1] in Figure 3). This inhibitor binds in an extended conformation with the residues in shallow binding pockets. The terminal benzyloxycarbonyl group lies in the S2 pocket. Despite many attempts, it did not prove possible to identify noncovalent inhibitors with more drug-like properties. Matthews et al therefore tried to design a mechanism-based covalent irreversible inhibitor. Replacement of the aldehyde by an ␣-unsaturated ethyl ester yielded a potent irreversible inhibitor (structure [2] in Figure 3) of the 3C protease. This Pharmacogenomics and molecular design D Bailey et al 43 Figure 3 Structures of rhinovirus protease inhibitors. Figure 4 Dendrogram of amino acid sequence homologies in the caspase family. Subfamily 1: cytokine processing (IL-1/IL-18); subfamily 2: apoptosis. example has predominantly focused on the use of crystal structure determinations to obtain the alignments of the inhibitors in the site and placed less reliance on computeraided algorithmic approaches. Design to Related Sites (eg Gene Families) The problem of design to closely related members of gene families is well exemplified by studies of the caspase family. Caspases are proteases that are major effectors of apoptosis and inflammatory cytokine processing, and thus are potential therapeutic targets for a number of diseases in which these processes are disregulated. Twelve human caspases have been described and assigned to two subfamilies on the basis of sequence homology (Figure 4). Selective inhibition of caspases 3 and 7 (which share close homology at the amino acid level) has been achieved by Lee et al,40 who also showed that other caspases are not critical for apoptosis in the cellular assays used. This is a good example of a ‘chemical genomics’ approach to selectively target members of a gene family using small molecule probes to determine their function in vivo. Their starting point was a nitro-isatin that had been identified through high-throughput screening (Compound [1] in Figure 5). The nitro group was replaced by a sulphonamide (structure [2] in Figure 5) that increased selectivity for caspases 1, 3, and 7 without compromising inhibitory activity. Further exploration of the sulphonamide functionality produced nanomolar active compounds with high specificity for caspases 3 and 7 (structures [3]–[5] in Figure 5). Molecular modelling of the compounds within the caspase sites Figure 5 Structures of caspase 3/7 inhibitors based on nitro-isatin. suggested that the specificity was due to Tyr204, Trp206 and Phe256 in the S2 pocket. A logical extension of this approach would be the discovery of inhibitors that discriminate between caspases 3 and 7, but this is more of a challenge for medicinal chemistry and computational design. www.nature.com/tpj Pharmacogenomics and molecular design D Bailey et al 44 DESIGN TO COMPUTATIONALLY INFERRED SITES (LIGAND-BASED DESIGN) Where the structure of the site is unknown but a number of active compounds for the site are available, it is possible to use molecular similarity computations to infer the structure of the site. This does not yield a complete picture of the site but identifies key hydrogen-bonding site points and lipophilic regions. An algorithm, SLATE,41 has been written to optimise the match between partially similar flexible molecules. Optimum superpositions can be obtained for the site points projected away from the ligand surfaces. These site points provide a minimal map of the putative receptor site. Drug design can then be carried out in a way that is analogous to site-directed design. The SLATE algorithm takes a set of small molecules in random conformations, identifies points of maximum hydrogen bonding interaction towards a site and computes the distance matrix for the points on each molecule. The difference distance matrix for all molecular pairs is computed. The sum of the difference distance matrix is minimised as the molecules are allowed to flex independently. Partial similarity is obtained by using the null correspondence method.36 Molecular site point projections are then superposed by the MATFIT algorithm. An example of this is the design of novel histamine H3 antagonists using information on the ligand alone without any information on the receptor. A selection of different H3 ligand classes is shown in Figure 6. The SLATE algorithm was used to optimise the match between the projected site maps of 12 ligands showing H3 antagonist activity using full flexibility round all the torsion angles. The resulting superposition of the 12 ligands is Figure 6 Histamine analogues as templates for ligand-based design of H3 antagonists. The Pharmacogenomics Journal Figure 7 Overlay of 12 H3 ligands from the SLATE algorithm. Magenta circles, hydrogen-bonding acceptor regions on the putative receptor, yellow circle, hydrogen-bonding donor region. shown in Figure 7: the hydrogen-bonding points are superposed and two separate regions are identified for the lipophilic tails. Novel molecules, one of which is shown in Figure 8, have been designed and synthesised within the molecular supersurface of the superposed set of ligands displayed in Figure 7. The molecule shown in Figure 8 has all the required interactions of the four hydrogen-bonding points together with both lipophilic tails. The affinity of the designed compound is given by pKi = 9.3. EXPLOITING SMALL MOLECULE SIMILARITY AND DIVERSITY Molecular similarity procedures are now reasonably mature, both in terms of defining similarity between molecules in a set and in the utility of the similarity concept to search for like molecules in a database of compound structures.42 Similarity work can now be extended to the development of virtual chemical libraries before synthesis. These libraries can be used for virtual screening to enlarge a company’s compound collection. This approach is particularly valuable in lead optimisation and makes use of the neighbourhood principle in molecular diversity. Molecules close to a hit or lead compound need to be synthesised to explore the similarity space, and by implication molecular interaction space between the lead and its binding site even though the structure of the site may be unknown.43 The neighbourhood procedure can be applied to a small selection of hits, and the descriptors of the activity are determined. Molecules con- Figure 8 A de novo designed H3 antagonist, pKi = 9.3. Pharmacogenomics and molecular design D Bailey et al 45 taining these descriptors are searched for from a virtual database within a defined neighbourhood radius and where possible these are synthesised in a combinatorial format. Molecular diversity within virtual libraries is also important for drug design; a judicious selection of representatives enables the library to cover a large amount of diversity space with a small set of chemical reactions.44,45 These molecular libraries can be built with ‘drug-like’ skeletons from a selection of representatives obtained by optimizing the dissimilarity. Another development in combichem design methods—RECAP—a retrosynthetic combinatorial analysis procedure46 takes the Derwent World Drug Index and fragments it; firstly so that fragments can be assembled by combichem at a future date and, secondly so that structural building blocks can be identified which are correlated with therapeutic effects. The fragmentation is performed along eleven bond cleavage types and structural motifs are correlated with therapeutic activity. Thus it should be possible to build combichem libraries around these motifs to search for new lead compounds with distinct biological effects. LINKING SITE EXPLORATION TO CHEMICAL GENOMICS The arrival of techniques, such as global gene expression analysis, with which to probe the dynamic effects of compounds on living cells has created the new field of pharmacogenomics,47 or, as it is increasingly becoming known, ‘chemical genomics’. Powerful synergies are to be found by linking the worlds of structural and chemical genomics as shown in Figure 9. Chemical intervention within cellular signalling systems can be effected at several specific levels, ranging from external receptors, via signalling enzyme cascades, to the transcription factors regulating gene expression.48 Targeting Transcription Factors Six percent of the genes encoded by the human genome are transcription factors.1 Amongst these, the nuclear hormone receptor family has proven a treasure trove for small molecule discovery.49 The combination of bioinformatics to identify new nuclear hormone receptors, and high-throughput screening to identify potential ligands, has led to the identification of a number of important new targets for chronic diseases such as diabetes and inflammation. Amongst these targets, the peroxisome proliferator agonist receptors (PPARs) alpha, delta and gamma are perhaps among the most intriguing.50 The structure of human PPAR gamma, and that of the agonist binding site, was first reported by Nolte et al,51 who also identified the key interactions made by the drug rosiglitazone. A key requisite for clinical utility is the absence of PPAR delta activity, since this has been associated with the development of colorectal carcinoma in animal and cell models (reviewed in Gupta et al52). In contrast, activity at PPAR alpha can be beneficial in diabetic conditions,53 so the best profile of a development candidate might be gamma (+++)/alpha (+)/delta (−). Although not currently in the pub- Figure 9 Synergies between structural and chemical genomics. lic domain, the structures of the ligand-binding sites of all three PPARs have been experimentally determined, providing an excellent basis for rational design of selective agonists. Together with the identification of further points of intervention, provided by downstream analysis of the genes induced by such compounds (eg resistin),54 considerable advances in understanding the molecular basis of insulin resistance and its amelioration are to be expected. This is clearly a rich vein of discovery for chemical genomics. Targeting Intracellular Signalling Systems A similar story can be told in the case of intracellular signalling kinases, whose discovery and targeting has led to the production of a range of experimental inhibitors, such as the tyrphostins.55 In a classical study using the mating pathway of Saccharomyces cerevisiae as a model system, Roberts et al56 used microarrays to detect mRNA expression as a measure of pathway activation. A significant number of genes were up- and down-regulated upon activation of specific MAP kinase pathways by the yeast mating hormones. Similarly, when individual components of this pathway were deleted using traditional genetic approaches, specific www.nature.com/tpj Pharmacogenomics and molecular design D Bailey et al 46 changes in the pattern of gene expression occurred. The authors inferred the relative importance of branch points within the pathway from these data, demonstrating the power of gene expression as a readout of complex cellular events. These studies have been extended to include the use of chemicals, in addition to genetic mutants, to provide a ‘compendium of expression profiles’ to delineate specific signalling pathways.57 Cellular Screens as a Measure of Chemotype Specificity Perhaps the most powerful chemical genomics discovery tool of all is the use of global gene expression analysis within human cells. A ground-breaking study by Weinstein et al,58 laid the foundations of understanding drug action at the cellular and molecular level using multivariate statistics. The work was based on the growth inhibition of 60 human tumor cell lines by a number of anti-cancer agents. Multiple correlations were made between potency of inhibition, molecular descriptors of the structures of the individual compounds, and molecular targets of different drug classes. Using appropriate data visualisation tools (in this case, cluster image maps), it was possible to identify positive or negative correlations between drugs and their potential targets, and guide the search for novel compounds towards, or away from, particular molecular mechanisms. This study was subsequently extended by including gene expression data from mRNA profiling experiments using the same 60 cell lines.59 This added dimension of information used with cluster image maps allowed the identification of novel putative markers of 5-fluorouracil and L-asparaginase action along with many other correlations, demonstrating the power of this approach as a new way of data mining. CONCLUSIONS Pharmacogenomics, in the ‘chemical genomics’ sense, provides an operational link between medicinal chemistry and biology, and enables structure-activity relationships between drug candidates to be determined on an objective and informed basis. The implications for both drug discovery and drug development, especially drug mode of action studies and side effect profiling, are clear: the provision of detailed information on the way in which drugs work within a biological setting can only enhance the productivity of the industry. A particularly interesting aspect of progress in this area is the relationship between pharmacogenomics and de novo drug design, where rapid, detailed feedback on the biological properties of novel structures is much in demand. The flood of sequence data from the HGP, together with the structural information being assembled for all proteins in the human proteome, will provide the foundation for an exploitable pharmacogenomics industry. These essential biomolecular data, when added to the mass of chemoinformatic data currently being generated by the pharmaceutical industry, provides a resource for pharmacogenomics to make a major impact on the production of new therapies and novel molecular probes for dissecting biochemical function. Since much of this work will have a large in silico The Pharmacogenomics Journal component, the relationship between pharmacogenomics and de novo drug design will be a natural marriage. ACKNOWLEDGEMENTS We would like to acknowledge Dr Iwan de Esch for providing the information for Figures 6, 7 and 8. DUALITY OF INTEREST None declared. REFERENCES 1 Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG et al. The sequence of the human genome. Science 2001; 291: 1304 –1351. 2 Blake JF. Chemoinformatics—predicting the physicochemical properties of ‘drug-like’ molecules. Curr Opin Biotechnol 2000; 1: 104 –107. 3 Bailey DS, Bondar A, Furness ML. Pharmacogenomics—it’s not just pharmacogenetics. Curr Opin Biotech 1998; 9: 595–601. 4 Stockwell BR. Chemical genetics: ligand-based discovery of gene function. Nature Rev 2000; 1: 116–125. 5 International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 2001; 409: 860–921. 6 Nature Structural Biology 2000; 7 (Suppl) 927–994. 7 Drews J. Drug discovery: a historical perspective. Science 2000; 287: 1960–1964. 8 Bottomley S. Value-added databases. Drug Discovery Today 1999; 4: 42–44. 9 Baxevanis AD. The Molecular Biology Database Collection: an updated compilation of biological database resources. Nucleic Acids Res 2001; 29: 1–10, and following articles. 10 Cagney G, Uetz P, Fields S. High-throughput screening for protein– protein interactions using two-hybrid assay. Meth Enzymol 2000; 328: 3–14. 11 Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 2000; 403: 623–627. 12 Rain J-C, Selig L, De Reuse H, Battaglia V, Reverdy C, Simon S et al. The protein–protein interaction map of Helicobacter pylori. Nature 2001; 409: 211–215. 13 Newman JR, Wolf E, Kim PS. A computationally directed screen identifying interacting coiled coils from Saccharomyces cerevisiae. Proc Natl Acad Sci USA 2000; 97: 13203–13208. 14 Pandey A, Mann M. Proteomics to study genes and genomes. Nature 2000; 405: 837–846. 15 Pandey A, Podtelejnikov AV, Blagoev B, Bustelo XR, Mann M, Lodish HF. Analysis of receptor signalling pathways by mass spectrometry: identification of vav-2 as a substrate of the epidermal and plateletderived growth factor receptors. Proc Natl Acad Sci USA 2000; 97: 179–184. 16 Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW. BIND—the biomolecular interaction network database. Nucleic Acids Res 2001; 29: 242–245. 17 Palczewski K, Kumasaka T, Hori T, Behnke CA, Motoshima H, Fox BA et al. Crystal structure of rhodopsin: a G protein-coupled receptor. Science 2000; 289: 739–745. 18 Wright PE, Dyson HJ. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J Mol Biol 1999; 293: 321– 331. 19 Shoemaker BA, Portman JJ, Wolynes PG. Speeding molecular recognition by using the folding funnel: the fly-casting mechanism. Proc Natl Acad Sci USA 2000; 97: 8868–8873. 20 Sanchez R, Sali A. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc Natl Acad Sci USA 1998; 95: 13597–13602. 21 Sanchez R, Pieper U, Melo F, Eswar N, Marti-Renom MA, Madhusudhan MS et al. Protein structure modeling for structural genomics. Nat Struct Biol 2000; 7 (Suppl): 986–990. 22 Gouldson PR, Higgs C, Smith RE, Dean MK, Gkoutos GV, Reynolds CA. Dimerization and domain swapping in G-protein-coupled receptors, a computational study. Neuropsychopharmacology 2000; 23 (Suppl): 60–77. Pharmacogenomics and molecular design D Bailey et al 47 23 Sternberg MJ, Bates PA, Kelley LA, MacCallum RM. Progress in protein structure prediction: assessment of CASP3. Curr Opin Struct Biol 1999; 9: 368–373. 24 Thornton JM, Todd AE, Milburn D, Borkakoti N, Orengo CA. From structure to function: approaches and limitations. Nat Struct Biol 2000; 7 (Suppl): 991–994. 25 Appelt K. Crystal structures of HIV-1 protease-inhibitor complexes. Perspect Drug Discov Design 1993; 1: 23–48. 26 Hendlich M. Databases for protein-ligand complexes. Acta Crystallogr D Biol Crystallogr 1998; 54: 1178–1182. 27 Hofmann K, Bucher P, Falquet L, Bairoch A. The PROSITE database, its status in 1999. Nucleic Acids Res 1999; 27: 215–219. 28 Danziger DJ, Dean PM. Automated site-directed drug design: the prediction and observation of ligand point positions at hydrogen-bonding regions on protein surfaces. Proc R Soc Lond B 1989; 236: 115–124. 29 MacBeath G, Schreiber SL. Printing proteins as microarrays for highthroughput function determination. Science 2000; 289: 1760–1763. 30 Zhang YP, Lewis RN, Hodges RS, McElhaney RN. Peptide models of the helical hydrophobic transmembrane segments of membrane proteins: interactions of acetyl-K(2)-(LA)(12)-K(2)-amide with phosphatidylethanolamine bilayer membranes. Biochemistry 2001; 40: 474 – 482. 31 Pantoliano MW, Rhind AW, Salemme FR. Microplate thermal shift assay for ligand development and multivariable protein chemistry optimization. US Patent 1997; 6 020 141. 32 Weinberger SR, Morris TS, Pawlak M. Recent trends in protein biochip technology. Pharmacogenomics 2000; 1: 395–416. 33 Cancilla MT, Leavell MD, Chow J, Leary JA. Mass spectrometry and immobilized enzymes for the screening of inhibitor libraries. Proc Natl Acad Sci USA 2000; 97: 12008–12013. 34 Schriemer C, Bundle DR, Li L, Hindsgaul O. Micro-scale frontal affinity chromatography with mass spectrometric detection: a new method for the screening of compound libraries. Agnew Chem Int Ed 1998; 37: 3383–3387. 35 Danziger DJ, Dean M. Automated site-directed drug design: a general algorithm for knowledge acquisition about hydrogen-bonding regions at protein surfaces. Proc R Soc Lond B 1989; 236: 101–113. 36 Dean PM. Defining and using molecular similarity or complementarity for drug design. In: Dean PM (ed). Molecular Similarity in Drug Design. Blackie Academic and Professional: Glasgow, 1995, pp 1–23. 37 Bailey D, Brown D. High-throughput chemistry and structure-based design: survival of the smartest. Drug Discovery Today 2001; 6: 57–59. 38 Klebe G (ed). Virtual Screening: an Alternative or Complement to High Throughput Screening? Kluwer/ESCOM: Deventer, 2000. 39 Matthews DA, Dragovich PS, Webber SE, Fuhrman SA, Patick AK, Zalman LS et al. Structure-assisted design of mechanism-based irreversible inhibitors of human rhinovirus 3C protease with potent antiviral activity against multiple rhinovirus serotypes. Proc Natl Acad Sci USA 1999; 96: 11000–11007. 40 Lee D, Long SA, Adams JL, Chan G, Vaidya KS, Francis TA et al. Potent and selective nonpeptide inhibitors of caspases 3 and 7 inhibit apoptosis and maintain cell functionality. J Biol Chem 2000; 275: 16007–16014. 41 Mills JEJ, De Esch IJP, Perkins TDJ, Dean PM. SLATE: a method for the 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 superposition of flexible ligands. J Comput-Aided Mol Design 2001; 15: 81–96. Willett P. Chemical similarity searching. J Chem Inf Comput Sci 1998; 38: 983–996. Cramer RD, Patterson DE, Clark RD, Soltanshahi F, Lawless MS. Virtual compound libraries: a new approach to decision making in molecular discovery research. J Chem Inf Comput Sci 1998; 38: 1010–1023. Clark RD, Langton WJ. Balancing representativeness against diversity using optimizable K-dissimilarity and hierarchical clustering. J Chem Inf Comput Sci 1998; 38: 1079–1086. Van Drie JH, Lajiness MJ. Approaches to virtual drug design. Drug Discovery Today 1998; 3: 274 –283. Lewell XQ, Judd DB, Watson SP, Hann MM. RECAP—retrosynthetic combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J Chem Inf Comput Sci 1998; 38: 511–522. Bailey DS, Dean PM. Pharmacogenomics and its impact on drug design and optimisation. Ann Rep Med Chem 1999; 34: 339–348. Zanders ED. Gene expression analysis as an aid to the identification of drug targets. Pharmacogenomics 2000; 1: 375–384. Kliewer SA, Lehmann JM, Willson TM. Orphan nuclear receptors: shifting endocrinology into reverse. Science 1999; 284: 757–760. Willson TM, Brown PJ, Sternbach DD, Henke BR. The PPARs: from orphan receptors to drug discovery. J Med Chem 2000; 43: 527–550. Nolte RT, Wisely GB, Westin S, Cobb JE, Lambert MH, Kurokawa R et al. Ligand binding and co-activator assembly of the peroxisome proliferator-activated receptor-gamma. Nature 1998; 395: 137–143. Gupta RA, Tan J, Krause WF, Geraci MW, Willson TM, Dey SK et al. Prostacyclin-mediated activation of peroxisome proliferator-activated receptor ␦ in colorectal cancer. Proc Natl Acad Sci USA 2000; 97: 13275–13280. Guerre-Millo M, Gervois P, Raspe E, Madsen L, Poulain P, Derudas B et al. Peroxisome proliferator-activated receptor alpha activators improve insulin sensitivity and reduce adiposity. J Biol Chem 2000; 275: 16638–16642. Steppan CM, Bailey ST, Bhat S, Brown EJ, Banerjee RR, Wright CM et al. The hormone resistin links obesity to diabetes. Nature 2001; 409: 307–312. Levitzki A. Protein tyrosine kinase inhibitors as novel therapeutic agents. Pharmacol Ther 1999; 82: 231–239. Roberts CJ, Nelson B, Marton MJ, Stoughton RS, Meyer MR, Bennett HA et al. Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science 2000; 287: 873–880. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD. Functional discovery via a compendium of expression profiles. Cell 2000; 102: 109–126. Weinstein JN, Myers TG, O’Connor PM, Friend SH, Fornace AJ, Kohn KW et al. An information-intensive approach to the molecular pharmacology of cancer. Science 1997; 275: 343–349. Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L et al. A gene expression database for the molecular pharmacology of cancer. Nature Genet 2000; 24: 236–244. www.nature.com/tpj