Download Site-specific molecular design and its relevance to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Discovery and development of integrase inhibitors wikipedia , lookup

Discovery and development of antiandrogens wikipedia , lookup

Drug interaction wikipedia , lookup

Magnesium transporter wikipedia , lookup

DNA-encoded chemical library wikipedia , lookup

Neuropharmacology wikipedia , lookup

Neuropsychopharmacology wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Drug discovery wikipedia , lookup

Drug design wikipedia , lookup

Transcript
The Pharmacogenomics Journal (2001) 1, 38–47
 2001 Nature Publishing Group All rights reserved 1470-269X/01 $15.00
www.nature.com/tpj
REVIEW
Site-specific molecular design and its relevance
to pharmacogenomics and chemical biology
D Bailey
E Zanders
P Dean
De Novo Pharmaceuticals, Cambridge CB2
3DD, UK
Correspondence:
David Bailey, De Novo Pharmaceuticals, St
Andrew’s House, 59 St Andrew’s Street,
Cambridge CB2 3DD, UK
Telephone: +44 1223 488888
Fax: +44 1223 488899
E-mail: david.bailey얀denovopharma.com
ABSTRACT
The emergence of the new discipline of pharmacogenomics reflects the
growing convergence of chemical and genomic space. The massive information-driven growth in both computational chemistry and structural
biology is leading to unprecedented opportunities in both chemical and biological design. In this paper we relate current opinion in structural biology
to recent developments in computational drug design. Sequence information
now permits protein structure prediction and, together with experimental
protein structure determination, a complete database of ligand-binding sites
and protein–protein interactions can be assembled. When aligned with site
exploration and virtual screening, this information provides a foundation for
structure-based pharmacogenomics. In association with chemical genomics,
structure-based design will allow major new insights into a compound’s biological and pharmaceutical properties. The Pharmacogenomics Journal (2001)
1, 38–47.
Keywords: pharmacogenomics; drug design; structural genomics; computational
chemistry; chemical genomics
Received: 14 February 2001
Accepted: 24 February 2001
INTRODUCTION
Genomics programmes are in the process of defining the entire gamut of protein
targets available for drug discovery, and the ligand-binding domains that they
encode,1 while chemoinformatics2 is providing an architecture in which to
handle corresponding chemical information. The developing interface between
these two areas, known colloquially as pharmacogenomics,3 promises to provide
a global platform through which to rationalise drug discovery at the molecular
and cellular levels.
A comprehensive pharmacogenomics platform has several elements, some of
which are illustrated in Figure 1. In its simplest iteration, the objective of
implementing such a platform is to produce a functional, small molecule ligand
for every protein, or ligand-binding domain within a protein, encoded by the
genome. If the protein is a therapeutic target, then the small molecule, if appropriately designed, may become a drug. If the protein’s utility is not immediately
apparent, then the small molecule may become a chemical probe for function.
In this paper we highlight the shift in emphasis away from genomics and proteomics as purely molecular biology disciplines, and examine their impact on
the developing field of computational pharmacogenomics.
The genetic basis of drug response and the direct observation of the effects of
a compound on biological systems, has been recently dubbed ‘chemical genetics’.4 In this article, we use the term ‘chemical genomics’ to emphasise the
contribution of genomics-based technologies to an understanding of the detailed
molecular interactions between specific proteins and their small molecule
ligands.
Pharmacogenomics and molecular design
D Bailey et al
39
Figure 1 The flow of pharmaceutical research from genomic information to population and personalised chemotherapies.
THE CONTRIBUTION OF GENE AND PROTEIN
SEQUENCE INFORMATION
The force that has revolutionised the way we look at drug
action is the Human Genome Project,5 and its progeny, global initiatives in structural biology.6 Once the complete set
of expressed human genes and proteins is available, we will
have at our disposal a powerful technology platform from
which to survey the landscape of structural and functional
biology. Such programmes will also define the complete set
of tractable drug discovery targets.7
As a foundation for such global initiatives, new bioinformatics approaches to classify genes and their resulting proteins will be required. Already, sophisticated multiple
sequence alignment techniques, supplemented by protein
homology modelling approaches, have led to the definition
of specific structural ‘motifs’ through which to identify and
classify new members of related gene families. A summary
and extensive review of new ‘knowledge bases’ in this area
have recently been published.8,9
SUPRAMOLECULAR ASSEMBLIES ANALYSED BY
PROTEOMICS
While the importance of quaternary structure in protein
function has been recognised for many years, recent attention has become focused on higher order structures of multiple protein complexes. This has been encouraged by the
need to define the components of functional assemblies
such as signalling pathways, and to infer the biological role
of all the proteins with which they are associated.
Analysis of protein complexes by traditional biochemical
methods has been a successful process but is inefficient. An
improvement in throughput has been gained by the use of
genetic screening techniques, such as the yeast two-hybrid
system to characterise interacting proteins.10 The genetic
basis of this system makes it particularly amenable to highthroughput analysis, and the results of studies are emerging
in which whole genome analyses of protein–protein interactions are possible.11,12 It should be noted, however, that the
number of permutations of ‘bait’ and ‘prey’ (as interacting
proteins are termed in this system) rapidly becomes unmanageable as the genome size increases. Newman et al13 have
attempted to overcome this in the case of yeast (with a proteome size of 6000) by selecting for study only the subset of
proteins (one out of eleven) that contain coil–coil motifs,
selected as more likely to form homotypic and heterotypic
interactions. Despite these advances, the nature of the twohybrid system makes it prone to false positives and negatives. In the former case, this may be due to non-specific
interactions between bait and prey, and in the latter, to the
inability to detect interactions caused by post-translational
modification, eg phosphorylation.
More recent techniques of protein identification using
mass spectrometry of constituent interacting peptides have
overcome some of these problems by using native protein
complexes.14 The detection of protein interactions that
result from phosphorylation, (eg through activation of signalling pathways by growth factor-receptor interactions) is
possible by applying secondary selection techniques such as
affinity purification of complexes by antibodies to phosphotyrosine and mass spectrometry of the proteins after resolution by electrophoresis. This is well illustrated by the work
of Pandey et al15 who have identified a novel target of EGF
receptor signalling using this approach.
A new generation of genomic databases is being
developed to store such information on protein–protein
interactions. Apart from those available commercially which
concentrate on yeast two hybrid data (for example the databases being developed by Curagen and Hybrigenics), the
recently described Biomolecular Interaction Network Database (BIND)16 anticipates the influx of data describing different interactions between proteins and other molecules to
rapidly highlight networks of biological or pharmaceutical
interest.
ACCELERATING THE PACE OF STRUCTURE
DETERMINATION
Figure 2 illustrates an intriguing fact: the number of new
structures deposited into the public PDB database has fallen
off between 1999 and 2000 despite the emergence of structural proteomics as the key area underpinning drug design.
This might in part be due to the increased commercial
activity in this field, with the formation of companies dedicated to high-throughput determination of proprietary protein structures for application in pharmaceutical discovery.
Like other genomics-based activities, these are expensive
and require large investments of capital, or the formation
of consortia of interested parties from the public or private
sectors. Also, the technical issues for the most highly used
method of structure determination, X ray crystallography,
remain formidable. However, there have been improvements in each stage of the process, from protein purification
and crystallisation to X-ray bombardment and data
www.nature.com/tpj
Pharmacogenomics and molecular design
D Bailey et al
40
Figure 2 PDB entries by year from 1990 to 2000.
reduction.6 The use of dedicated synchrotron sources also
allows a significant increase in data collection using smaller
crystals, although obtaining the latter is still a major ratelimiting step in the process.
The selection of targets for structural biology programmes
is currently biased towards those which are easy to purify
and crystallise, at the expense of more challenging targets
such as membrane proteins, which comprise a vast family
of considerable interest as drug targets. Nevertheless, the
recent structure determination of rhodopsin17 has shown
that it is possible to determine the structures of membranebound targets, and the solution of the structures of related
proteins of pharmaceutical relevance, such as G-protein
coupled receptors (GPCRs), must soon become routine.
However, X-ray structures of proteins only provide a single conformational snapshot of the protein in a highly
restricted crystal environment. NMR determinations in solution are more representative of the variety of conformations expected to be available for ligand interaction
within purified proteins, although they too probably reflect
a relatively small subset of the protein structures adopted in
biological systems. Far from being a neat collection of fully
folded proteins, the cellular proteome is now thought to
comprise a wide range of partially-folded intermediates
existing in dynamic equilibrium. Many of the proteins in
the cell appear to be unfolded most of the time.18 Folding
has been pictured as a journey through an ensemble of partially folded states each of which is characterised by specific
reaction co-ordinates or order parameters. In fact, a relatively unstructured protein molecule can have a greater capture radius for a specific binding site than the folded state
with its restricted conformational freedom.19 Within biological systems, the dynamic nature of protein structures
must have a major impact on the way in which both small
and large ligands might interact with a target protein.
HOMOLOGY MODELLING: THE WAY FORWARD?
X-ray crystallography is a slow process compared with gene
sequencing, and NMR is limited by the size of protein that
can be examined effectively. The only efficient way to process sequence data is by large-scale homology modelling.
Sanchez & Sali, in an important paper, have attempted the
The Pharmacogenomics Journal
global creation of protein structure models from the yeast
genome.20 They used 1071 sequences and employed ALIGN
and MODELLER to create homology models automatically.
This procedure provides clues to the function of the proteins
by identifying the folds and 3-D motifs known to bind to
specific ligands eg the identification of SH3 domains. Steady
progress is being made, with perhaps 3D-computational
models covering half the human proteome becoming available by 2003.21
It has recently been shown that GPCRs dimerize and heterodimerize. Gouldson et al22 have studied this in detail in
the absence of ligand and in the presence of an agonist or
inverse agonist using a combination of computational techniques; these were molecular dynamics, correlated mutation
analysis, and evolutionary trace analysis. The evidence
points to transmembrane helices 5 and 6 forming the primary interface for dimerization. A second site for dimerization was identified by evolutionary trace analysis on helices
2 and 3. These principal functional sites appear to govern
domain swapping in subfamilies of GPCRs and may extend
molecular modelling from isolated proteins to chimeric
assemblies.
Homology modelling has its limitations, however. A crystal structure of a near homologue has to be available, and
the unpredictable nature of any errors in the model has to
be recognised. Despite these caveats, computational prediction shows steadily increasing accuracy when compared
with corresponding crystal data.23
Every protein contains surfaces that interact both with
other elements of the same protein and/or with ligand molecules. Can function be inferred from structure alone?
Although a comprehensive virtual analysis of the human
proteome is not feasible at present, rules for such global predictions are being generated, based on structural data
accumulated from specific examples.24 Despite the large
number of proteins of varied function, there are limited
numbers of protein families and folds. To complicate matters, many folds are conserved between proteins of widely
differing function, and conversely, proteins with an identical fold pattern can perform multiple biochemical functions. Thus there are currently no hard-and-fast rules for prediction of function from folds alone. The hope is that this
will become feasible as structural databases reach critical
mass.
From the perspective of drug discovery, however, it is
already possible to relate individual folds and motifs to
binding sites for specific ligands and substrates. For example,
the essential features of the aspartyl protease active site are
represented by a configuration of just eight atoms that are
preserved in every member of that family.24
These observations define a starting point for the next
phase of pharmacogenomics: the complete definition of the
molecular interactions between target proteins and their
cognate ligands.
THE PRESENCE OF LIGANDS
The ‘nucleating’ effect of specific ligands in stabilising particular protein structures can be seen in the substantial reor-
Pharmacogenomics and molecular design
D Bailey et al
41
ganisation in protein structure observed between some apoenzymes and their corresponding liganded structures (eg
HIV protease +/− inhibitors).25 The experimental determination of a sufficient number of representative liganded
structures to form a data set, for example in ReLiBase,26 is a
natural extension of the structural biology initiatives
referred to above. This database has been developed for use
with a variety of structural analysis and query tools to focus
attention on the molecular rules governing protein–ligand
interactions.
Site plasticity has a marked impact on ligand design
within sites. A conformationally labile site, when
approached by one inhibitor may adopt a subtly different,
and possibly less energetically favourable structure than
when addressed with a different ligand. All these factors
affect de novo design. They also have important consequences for design to sites that differ as a consequence of
SNP mutation.
LIGAND-BINDING SITES
Site Discovery by Computational Methods
Protein sequence searching using bioinformatic methods
such as PROSITE,27 can be used to identify characteristic
ligand-binding sites by local sequence correlation with template motifs. However, sites can only be unambiguously
identified by structural determination. Bioinformatics tools
using sequence alone lack the ability to search for 3-dimensional structural information within complex site structures
where significant folding creates the site architecture.
Ligand molecules are generally much smaller than the
proteins to which they bind. The interaction energy
between the protein and its ligand has to be sufficiently
strong for the complex to form and to exist for a significant
time. Van der Waals interactions can be increased by maximising the contact area between the ligand and the site on
the protein surface, and this can be achieved by docking the
ligand into a cavity or cleft. Therefore algorithms that search
for surface cavities can be used to identify putative binding
sites. Once these cavities are found, local site maps can be
computed to identify important molecular determinants of
ligand binding, such as hydrogen bonding and hydrophobic
site points.28
Defining Sites Biochemically
Traditional methods of defining sites on proteins employ
small molecule probes that have been labelled with radioactivity or fluorescence. These approaches have recently been
automated using microarray formats.29 However, introducing radioisotopes and chemical modifications within ligands
may affect binding, and this has spurred the development
of direct biophysical methods for site definition.
Differential scanning microcalorimetry is one technique
that has been used to probe protein structure for some time
(for a recent application see Zhang et al30). The method has
been converted into a screening format to detect the perturbation of proteins resulting from small molecule binding.31
Transient changes in refractive index caused by binding
of molecules to protein surfaces has been exploited in the
Biacore system, where surface plasmon resonance (SPR) is
used to probe similar molecular interactions. Instrumentation is also commercially available for detecting protein–
ligand interactions in a solid phase format (eg the SELDI
‘chip’ from Ciphergen). SPR methods for solid phase binding assays are reviewed in Weinberger et al.32
A completely different approach for detecting and characterising small molecules bound to protein targets is mass
spectrometry. Methods for binding small molecules to proteins which are immobilised on resins have been described
in which the input compound mixture and the unbound
components can be characterised by mass spectrometry. The
difference in mass spectrum between the two allows identification of the bound components.33 Ideally one would like
to rank the binding affinities of a number of compounds;
the frontal affinity chromatography system described by
Schreimer is designed to achieve this by characterising the
molecules that elute at different rates from the protein
matrix using in-line mass spectrometry.34
In summary, we can expect these techniques to elucidate
the structural features characteristic of protein–protein and
protein–small molecule interactions, and to provide a fundamental data set from which to extrapolate key descriptors.
Site Mapping in Silico
If the complete structure of the target protein is known, it
is straightforward to map the site for both de novo drug
design and for docking. Hydrogen-bond maps (‘site points’)
can be created on the surface of the site using a variety of
algorithms.35 Ligand design can be directed to appropriately
chosen sets of site points with complementary hydrogen
bonds built into appropriate scaffolds at optimal positions
to add to the interaction energy. The case of hydrophobic
interactions is less well understood because of the entropic
components involved in interactions in solution. Hydrophobic interactions are determined by summation of a number of component small interactions, thus hydrophobic
regions rather than points are defined. A typical site may
contain 30 site points, although small drug-like molecules
usually contain only a small number of corresponding
ligand points. This fact has important consequences for drug
design; it creates a combinatorial choice of subsets of site
points that could be used to design novel scaffolds with the
potential to fit the site. For example, taking five site points
at a time, there are 140 000 possible subsets of site points
available for design.36 A single ligand may exhibit promiscuous binding modes due to similarities in the spatial positioning of some subsets; in effect the small molecule has
too many good binding choices. The identification of the
subsets of site points where the choice is minimized is therefore crucial for drug design.
CONFORMATIONAL FLEXIBILITY OF INDIVIDUAL
SITES
NMR studies show that target sites can adopt many different
conformations. This plasticity raises the question: how effective is drug design closely tailored to a single conformation
of the site? Plasticity can be observed in two forms. Firstly,
www.nature.com/tpj
Pharmacogenomics and molecular design
D Bailey et al
42
large-scale shift in the protein backbone atoms may be
encountered. Secondly, small changes in the equilibrium
structure of the protein may affect the conformations of the
residues rather than perturbing the protein backbone. Most
de novo design algorithms have been based on a single fixed
structure for the site. Large-scale domain movements would
have to be handled by using numerous models of the site
as inputs for design. However, where flexibility in the site is
encountered, building flexibility into the ligand design that
can match the flexibility of the site offers a way round the
problem.
Site Exploration 1
If the objective is to find at least one functional small molecule ligand for every protein, and the human proteome
contains at least 50 000 proteins,5 then the scale of the task
becomes immediately apparent. Highly-parallel screens at
high throughput are clearly necessary, but at what cost? It
is widely acknowledged in the pharmaceutical industry that
high throughput screening (HTS) has not met the early
expectations that many leads would be found rapidly.37
There are sound theoretical reasons why hopes for HTS
have not been born out:
쐌 Chemical space for drug-like molecules is vast: greater
than 1060 drug-like compounds could exist in principle.
쐌 The number of different structures made to date is only
107–108 and most of the molecules are based on conserved
parental structures.
쐌 The amount of structural diversity available is limited.
쐌 Ligand binding is often sterically very specific. Tight binding pockets impose severe restrictions on the size of molecular fragments that can be fitted within them.
In-house compound collections commonly range from
50 000 to 1 000 000 different compounds. Clearly this number is minuscule compared with the potential size of the set
of drug-like molecules. One way of increasing diversity is
through in silico screening of large libraries (real and virtual).
Virtual screening also has the advantage of addressing the
other points highlighted above.
Virtual screening methods attempt to overcome some of
the deficiencies in the HTS procedure by dramatically
increasing the size of the screening set. This extension also
enables large numbers of virtual molecules (existing only in
silico) to be screened as well. Virtual screening is conceptually achieved in two ways, either by computational docking into a 3-dimensional representation of the site, or by
similarity with a known pharmacophore of an active molecule where site data are unavailable. Virtual screening
offers large financial savings by cutting down the number
of compounds to be screened in vitro since only those compounds that show acceptable docking results need be
screened in the laboratory.
Docking algorithms require a set of coordinates that may
be derived from crystallographic studies, NMR or homology
models of the binding site. Three approaches are possible:
rigid docking, flexible ligand docking and a combination of
flexible site with flexible ligand docking. These procedures
The Pharmacogenomics Journal
have been discussed in detail elsewhere.38 Flexible ligand
docking is currently feasible for a million compound
libraries and is currently receiving the most attention.
Docking algorithms are highly dependent on the scoring
function, a statistical measure of the interaction energy
between the ligand and the site. These parameters are
derived from ligand co-crystal data and binding measurements. In a sense, a universal scoring function is the ‘holy
grail’ of automated computational drug design. Perhaps
more realistically it will be possible to develop scoring functions for separate protein classes, thus facilitating a more targeted approach to ligand docking.
Site Exploration 2
We have discussed the fitting of existing (synthesized or
virtual) molecules into sites using the docking techniques.
We now consider specific design methods that are capable
of creating molecules de novo.
Drug binding to specific sites is usually reversible and achieved through non-covalent interactions such as hydrogen
bonding between site and ligand, hydrophobic interactions,
electrostatic interactions, steric interactions and those
mediated by water molecules in the site. If the site is an
enzyme, there may be more specific interactions such as
those between the ligand and metal ions in the site, or interactions between cations and the ␲ clouds of aromatic rings.
All of these features need to be incorporated into scoring
functions for de novo design. The scoring functions for drug
design and docking show many similarities, the aim being
to prioritise a list of structures that can be compared for
further assessment as candidates for synthesis.
A commonly used approach is to identify an initial inhibitor through screening, fit it into the site computationally,
and make modifications to the structure in order to increase
the interaction energy. Successful modification is often confirmed by X-ray crystallography. The following example of
rhinovirus protease illustrates this.
Rhinoviruses are causative agents of the common cold.
Maturation of the virus involves cleavage of a precursor
polyprotein by rhinovirus 3C protease therefore inactivation
of the protease provides a target for therapeutic intervention. The enzyme is a cysteine protease with important
residues Cys147, His40 and Glu71 and attacks the Gln-Gly
junction in small peptides. However there are strong similarities between the active site and that of trypsin-like serine
proteases, enabling Matthews et al to use a combination of
approaches to the development of strong inhibitors.39 This
was achieved by modifications of the N-terminus of small
peptides by adding an aldehyde functionality (structure [1]
in Figure 3). This inhibitor binds in an extended conformation with the residues in shallow binding pockets. The
terminal benzyloxycarbonyl group lies in the S2 pocket.
Despite many attempts, it did not prove possible to identify
noncovalent inhibitors with more drug-like properties. Matthews et al therefore tried to design a mechanism-based
covalent irreversible inhibitor. Replacement of the aldehyde
by an ␣␤-unsaturated ethyl ester yielded a potent irreversible
inhibitor (structure [2] in Figure 3) of the 3C protease. This
Pharmacogenomics and molecular design
D Bailey et al
43
Figure 3 Structures of rhinovirus protease inhibitors.
Figure 4 Dendrogram of amino acid sequence homologies in the
caspase family. Subfamily 1: cytokine processing (IL-1/IL-18);
subfamily 2: apoptosis.
example has predominantly focused on the use of crystal
structure determinations to obtain the alignments of the
inhibitors in the site and placed less reliance on computeraided algorithmic approaches.
Design to Related Sites (eg Gene Families)
The problem of design to closely related members of gene
families is well exemplified by studies of the caspase family.
Caspases are proteases that are major effectors of apoptosis
and inflammatory cytokine processing, and thus are potential therapeutic targets for a number of diseases in which
these processes are disregulated. Twelve human caspases
have been described and assigned to two subfamilies on the
basis of sequence homology (Figure 4).
Selective inhibition of caspases 3 and 7 (which share close
homology at the amino acid level) has been achieved by Lee
et al,40 who also showed that other caspases are not critical
for apoptosis in the cellular assays used. This is a good
example of a ‘chemical genomics’ approach to selectively
target members of a gene family using small molecule
probes to determine their function in vivo. Their starting
point was a nitro-isatin that had been identified through
high-throughput screening (Compound [1] in Figure 5).
The nitro group was replaced by a sulphonamide
(structure [2] in Figure 5) that increased selectivity for caspases 1, 3, and 7 without compromising inhibitory activity.
Further exploration of the sulphonamide functionality produced nanomolar active compounds with high specificity
for caspases 3 and 7 (structures [3]–[5] in Figure 5). Molecular modelling of the compounds within the caspase sites
Figure 5
Structures of caspase 3/7 inhibitors based on nitro-isatin.
suggested that the specificity was due to Tyr204, Trp206 and
Phe256 in the S2 pocket.
A logical extension of this approach would be the discovery of inhibitors that discriminate between caspases 3
and 7, but this is more of a challenge for medicinal chemistry and computational design.
www.nature.com/tpj
Pharmacogenomics and molecular design
D Bailey et al
44
DESIGN TO COMPUTATIONALLY INFERRED SITES
(LIGAND-BASED DESIGN)
Where the structure of the site is unknown but a number of
active compounds for the site are available, it is possible to
use molecular similarity computations to infer the structure
of the site. This does not yield a complete picture of the site
but identifies key hydrogen-bonding site points and lipophilic regions. An algorithm, SLATE,41 has been written to
optimise the match between partially similar flexible molecules. Optimum superpositions can be obtained for the site
points projected away from the ligand surfaces. These site
points provide a minimal map of the putative receptor site.
Drug design can then be carried out in a way that is analogous to site-directed design. The SLATE algorithm takes a
set of small molecules in random conformations, identifies
points of maximum hydrogen bonding interaction towards
a site and computes the distance matrix for the points on
each molecule. The difference distance matrix for all molecular pairs is computed. The sum of the difference distance
matrix is minimised as the molecules are allowed to flex
independently. Partial similarity is obtained by using the
null correspondence method.36 Molecular site point projections are then superposed by the MATFIT algorithm.
An example of this is the design of novel histamine H3
antagonists using information on the ligand alone without
any information on the receptor. A selection of different H3
ligand classes is shown in Figure 6.
The SLATE algorithm was used to optimise the match
between the projected site maps of 12 ligands showing H3
antagonist activity using full flexibility round all the torsion
angles. The resulting superposition of the 12 ligands is
Figure 6 Histamine analogues as templates for ligand-based
design of H3 antagonists.
The Pharmacogenomics Journal
Figure 7 Overlay of 12 H3 ligands from the SLATE algorithm.
Magenta circles, hydrogen-bonding acceptor regions on the putative receptor, yellow circle, hydrogen-bonding donor region.
shown in Figure 7: the hydrogen-bonding points are superposed and two separate regions are identified for the lipophilic tails.
Novel molecules, one of which is shown in Figure 8, have
been designed and synthesised within the molecular supersurface of the superposed set of ligands displayed in Figure 7.
The molecule shown in Figure 8 has all the required interactions of the four hydrogen-bonding points together with
both lipophilic tails. The affinity of the designed compound
is given by pKi = 9.3.
EXPLOITING SMALL MOLECULE SIMILARITY AND
DIVERSITY
Molecular similarity procedures are now reasonably mature,
both in terms of defining similarity between molecules in a
set and in the utility of the similarity concept to search for
like molecules in a database of compound structures.42 Similarity work can now be extended to the development of virtual chemical libraries before synthesis. These libraries can
be used for virtual screening to enlarge a company’s compound collection. This approach is particularly valuable in
lead optimisation and makes use of the neighbourhood
principle in molecular diversity. Molecules close to a hit or
lead compound need to be synthesised to explore the similarity space, and by implication molecular interaction space
between the lead and its binding site even though the structure of the site may be unknown.43 The neighbourhood procedure can be applied to a small selection of hits, and the
descriptors of the activity are determined. Molecules con-
Figure 8
A de novo designed H3 antagonist, pKi = 9.3.
Pharmacogenomics and molecular design
D Bailey et al
45
taining these descriptors are searched for from a virtual database within a defined neighbourhood radius and where
possible these are synthesised in a combinatorial format.
Molecular diversity within virtual libraries is also
important for drug design; a judicious selection of representatives enables the library to cover a large amount of diversity space with a small set of chemical reactions.44,45 These
molecular libraries can be built with ‘drug-like’ skeletons
from a selection of representatives obtained by optimizing
the dissimilarity. Another development in combichem
design methods—RECAP—a retrosynthetic combinatorial
analysis procedure46 takes the Derwent World Drug Index
and fragments it; firstly so that fragments can be assembled
by combichem at a future date and, secondly so that structural building blocks can be identified which are correlated
with therapeutic effects. The fragmentation is performed
along eleven bond cleavage types and structural motifs are
correlated with therapeutic activity. Thus it should be possible to build combichem libraries around these motifs to
search for new lead compounds with distinct biological
effects.
LINKING SITE EXPLORATION TO CHEMICAL
GENOMICS
The arrival of techniques, such as global gene expression
analysis, with which to probe the dynamic effects of compounds on living cells has created the new field of pharmacogenomics,47 or, as it is increasingly becoming known,
‘chemical genomics’. Powerful synergies are to be found by
linking the worlds of structural and chemical genomics as
shown in Figure 9.
Chemical intervention within cellular signalling systems
can be effected at several specific levels, ranging from external receptors, via signalling enzyme cascades, to the transcription factors regulating gene expression.48
Targeting Transcription Factors
Six percent of the genes encoded by the human genome are
transcription factors.1 Amongst these, the nuclear hormone
receptor family has proven a treasure trove for small molecule discovery.49 The combination of bioinformatics to
identify new nuclear hormone receptors, and high-throughput screening to identify potential ligands, has led to the
identification of a number of important new targets for
chronic diseases such as diabetes and inflammation.
Amongst these targets, the peroxisome proliferator agonist
receptors (PPARs) alpha, delta and gamma are perhaps
among the most intriguing.50
The structure of human PPAR gamma, and that of the
agonist binding site, was first reported by Nolte et al,51 who
also identified the key interactions made by the drug rosiglitazone. A key requisite for clinical utility is the absence of
PPAR delta activity, since this has been associated with the
development of colorectal carcinoma in animal and cell
models (reviewed in Gupta et al52). In contrast, activity at
PPAR alpha can be beneficial in diabetic conditions,53 so the
best profile of a development candidate might be gamma
(+++)/alpha (+)/delta (−). Although not currently in the pub-
Figure 9
Synergies between structural and chemical genomics.
lic domain, the structures of the ligand-binding sites of all
three PPARs have been experimentally determined, providing an excellent basis for rational design of selective agonists. Together with the identification of further points of
intervention, provided by downstream analysis of the genes
induced by such compounds (eg resistin),54 considerable
advances in understanding the molecular basis of insulin
resistance and its amelioration are to be expected. This is
clearly a rich vein of discovery for chemical genomics.
Targeting Intracellular Signalling Systems
A similar story can be told in the case of intracellular signalling kinases, whose discovery and targeting has led to the
production of a range of experimental inhibitors, such as
the tyrphostins.55 In a classical study using the mating pathway of Saccharomyces cerevisiae as a model system, Roberts et
al56 used microarrays to detect mRNA expression as a measure of pathway activation. A significant number of genes
were up- and down-regulated upon activation of specific
MAP kinase pathways by the yeast mating hormones. Similarly, when individual components of this pathway were
deleted using traditional genetic approaches, specific
www.nature.com/tpj
Pharmacogenomics and molecular design
D Bailey et al
46
changes in the pattern of gene expression occurred. The
authors inferred the relative importance of branch points
within the pathway from these data, demonstrating the
power of gene expression as a readout of complex cellular
events. These studies have been extended to include the use
of chemicals, in addition to genetic mutants, to provide a
‘compendium of expression profiles’ to delineate specific
signalling pathways.57
Cellular Screens as a Measure of Chemotype Specificity
Perhaps the most powerful chemical genomics discovery
tool of all is the use of global gene expression analysis
within human cells. A ground-breaking study by Weinstein
et al,58 laid the foundations of understanding drug action at
the cellular and molecular level using multivariate statistics.
The work was based on the growth inhibition of 60 human
tumor cell lines by a number of anti-cancer agents. Multiple
correlations were made between potency of inhibition, molecular descriptors of the structures of the individual compounds, and molecular targets of different drug classes.
Using appropriate data visualisation tools (in this case, cluster image maps), it was possible to identify positive or negative correlations between drugs and their potential targets,
and guide the search for novel compounds towards, or away
from, particular molecular mechanisms. This study was subsequently extended by including gene expression data from
mRNA profiling experiments using the same 60 cell lines.59
This added dimension of information used with cluster
image maps allowed the identification of novel putative
markers of 5-fluorouracil and L-asparaginase action along
with many other correlations, demonstrating the power of
this approach as a new way of data mining.
CONCLUSIONS
Pharmacogenomics, in the ‘chemical genomics’ sense, provides an operational link between medicinal chemistry and
biology, and enables structure-activity relationships
between drug candidates to be determined on an objective
and informed basis. The implications for both drug discovery and drug development, especially drug mode of
action studies and side effect profiling, are clear: the provision of detailed information on the way in which drugs
work within a biological setting can only enhance the productivity of the industry.
A particularly interesting aspect of progress in this area is
the relationship between pharmacogenomics and de novo
drug design, where rapid, detailed feedback on the biological
properties of novel structures is much in demand.
The flood of sequence data from the HGP, together with
the structural information being assembled for all proteins
in the human proteome, will provide the foundation for an
exploitable pharmacogenomics industry. These essential
biomolecular data, when added to the mass of chemoinformatic data currently being generated by the pharmaceutical
industry, provides a resource for pharmacogenomics to
make a major impact on the production of new therapies
and novel molecular probes for dissecting biochemical function. Since much of this work will have a large in silico
The Pharmacogenomics Journal
component, the relationship between pharmacogenomics
and de novo drug design will be a natural marriage.
ACKNOWLEDGEMENTS
We would like to acknowledge Dr Iwan de Esch for providing the information for Figures 6, 7 and 8.
DUALITY OF INTEREST
None declared.
REFERENCES
1 Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG et al.
The sequence of the human genome. Science 2001; 291: 1304 –1351.
2 Blake JF. Chemoinformatics—predicting the physicochemical properties of ‘drug-like’ molecules. Curr Opin Biotechnol 2000; 1: 104 –107.
3 Bailey DS, Bondar A, Furness ML. Pharmacogenomics—it’s not just
pharmacogenetics. Curr Opin Biotech 1998; 9: 595–601.
4 Stockwell BR. Chemical genetics: ligand-based discovery of gene function. Nature Rev 2000; 1: 116–125.
5 International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 2001; 409: 860–921.
6 Nature Structural Biology 2000; 7 (Suppl) 927–994.
7 Drews J. Drug discovery: a historical perspective. Science 2000; 287:
1960–1964.
8 Bottomley S. Value-added databases. Drug Discovery Today 1999; 4:
42–44.
9 Baxevanis AD. The Molecular Biology Database Collection: an updated
compilation of biological database resources. Nucleic Acids Res 2001;
29: 1–10, and following articles.
10 Cagney G, Uetz P, Fields S. High-throughput screening for protein–
protein interactions using two-hybrid assay. Meth Enzymol 2000; 328:
3–14.
11 Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR et al. A
comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 2000; 403: 623–627.
12 Rain J-C, Selig L, De Reuse H, Battaglia V, Reverdy C, Simon S et al.
The protein–protein interaction map of Helicobacter pylori. Nature
2001; 409: 211–215.
13 Newman JR, Wolf E, Kim PS. A computationally directed screen identifying interacting coiled coils from Saccharomyces cerevisiae. Proc Natl
Acad Sci USA 2000; 97: 13203–13208.
14 Pandey A, Mann M. Proteomics to study genes and genomes. Nature
2000; 405: 837–846.
15 Pandey A, Podtelejnikov AV, Blagoev B, Bustelo XR, Mann M, Lodish
HF. Analysis of receptor signalling pathways by mass spectrometry:
identification of vav-2 as a substrate of the epidermal and plateletderived growth factor receptors. Proc Natl Acad Sci USA 2000; 97:
179–184.
16 Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue
CW. BIND—the biomolecular interaction network database. Nucleic
Acids Res 2001; 29: 242–245.
17 Palczewski K, Kumasaka T, Hori T, Behnke CA, Motoshima H, Fox BA
et al. Crystal structure of rhodopsin: a G protein-coupled receptor.
Science 2000; 289: 739–745.
18 Wright PE, Dyson HJ. Intrinsically unstructured proteins: re-assessing
the protein structure-function paradigm. J Mol Biol 1999; 293: 321–
331.
19 Shoemaker BA, Portman JJ, Wolynes PG. Speeding molecular recognition by using the folding funnel: the fly-casting mechanism. Proc Natl
Acad Sci USA 2000; 97: 8868–8873.
20 Sanchez R, Sali A. Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. Proc Natl Acad Sci USA 1998; 95:
13597–13602.
21 Sanchez R, Pieper U, Melo F, Eswar N, Marti-Renom MA, Madhusudhan MS et al. Protein structure modeling for structural genomics. Nat
Struct Biol 2000; 7 (Suppl): 986–990.
22 Gouldson PR, Higgs C, Smith RE, Dean MK, Gkoutos GV, Reynolds CA.
Dimerization and domain swapping in G-protein-coupled receptors,
a computational study. Neuropsychopharmacology 2000; 23 (Suppl):
60–77.
Pharmacogenomics and molecular design
D Bailey et al
47
23 Sternberg MJ, Bates PA, Kelley LA, MacCallum RM. Progress in protein
structure prediction: assessment of CASP3. Curr Opin Struct Biol 1999;
9: 368–373.
24 Thornton JM, Todd AE, Milburn D, Borkakoti N, Orengo CA. From
structure to function: approaches and limitations. Nat Struct Biol 2000;
7 (Suppl): 991–994.
25 Appelt K. Crystal structures of HIV-1 protease-inhibitor complexes. Perspect Drug Discov Design 1993; 1: 23–48.
26 Hendlich M. Databases for protein-ligand complexes. Acta Crystallogr
D Biol Crystallogr 1998; 54: 1178–1182.
27 Hofmann K, Bucher P, Falquet L, Bairoch A. The PROSITE database, its
status in 1999. Nucleic Acids Res 1999; 27: 215–219.
28 Danziger DJ, Dean PM. Automated site-directed drug design: the prediction and observation of ligand point positions at hydrogen-bonding
regions on protein surfaces. Proc R Soc Lond B 1989; 236: 115–124.
29 MacBeath G, Schreiber SL. Printing proteins as microarrays for highthroughput function determination. Science 2000; 289: 1760–1763.
30 Zhang YP, Lewis RN, Hodges RS, McElhaney RN. Peptide models of the
helical hydrophobic transmembrane segments of membrane proteins:
interactions of acetyl-K(2)-(LA)(12)-K(2)-amide with phosphatidylethanolamine bilayer membranes. Biochemistry 2001; 40: 474 – 482.
31 Pantoliano MW, Rhind AW, Salemme FR. Microplate thermal shift assay
for ligand development and multivariable protein chemistry optimization. US Patent 1997; 6 020 141.
32 Weinberger SR, Morris TS, Pawlak M. Recent trends in protein biochip
technology. Pharmacogenomics 2000; 1: 395–416.
33 Cancilla MT, Leavell MD, Chow J, Leary JA. Mass spectrometry and
immobilized enzymes for the screening of inhibitor libraries. Proc Natl
Acad Sci USA 2000; 97: 12008–12013.
34 Schriemer C, Bundle DR, Li L, Hindsgaul O. Micro-scale frontal affinity
chromatography with mass spectrometric detection: a new method
for the screening of compound libraries. Agnew Chem Int Ed 1998; 37:
3383–3387.
35 Danziger DJ, Dean M. Automated site-directed drug design: a general
algorithm for knowledge acquisition about hydrogen-bonding regions
at protein surfaces. Proc R Soc Lond B 1989; 236: 101–113.
36 Dean PM. Defining and using molecular similarity or complementarity
for drug design. In: Dean PM (ed). Molecular Similarity in Drug Design.
Blackie Academic and Professional: Glasgow, 1995, pp 1–23.
37 Bailey D, Brown D. High-throughput chemistry and structure-based
design: survival of the smartest. Drug Discovery Today 2001; 6: 57–59.
38 Klebe G (ed). Virtual Screening: an Alternative or Complement to High
Throughput Screening? Kluwer/ESCOM: Deventer, 2000.
39 Matthews DA, Dragovich PS, Webber SE, Fuhrman SA, Patick AK, Zalman LS et al. Structure-assisted design of mechanism-based irreversible
inhibitors of human rhinovirus 3C protease with potent antiviral
activity against multiple rhinovirus serotypes. Proc Natl Acad Sci USA
1999; 96: 11000–11007.
40 Lee D, Long SA, Adams JL, Chan G, Vaidya KS, Francis TA et al. Potent
and selective nonpeptide inhibitors of caspases 3 and 7 inhibit
apoptosis and maintain cell functionality. J Biol Chem 2000; 275:
16007–16014.
41 Mills JEJ, De Esch IJP, Perkins TDJ, Dean PM. SLATE: a method for the
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
superposition of flexible ligands. J Comput-Aided Mol Design 2001; 15:
81–96.
Willett P. Chemical similarity searching. J Chem Inf Comput Sci 1998;
38: 983–996.
Cramer RD, Patterson DE, Clark RD, Soltanshahi F, Lawless MS. Virtual
compound libraries: a new approach to decision making in molecular
discovery research. J Chem Inf Comput Sci 1998; 38: 1010–1023.
Clark RD, Langton WJ. Balancing representativeness against diversity
using optimizable K-dissimilarity and hierarchical clustering. J Chem Inf
Comput Sci 1998; 38: 1079–1086.
Van Drie JH, Lajiness MJ. Approaches to virtual drug design. Drug Discovery Today 1998; 3: 274 –283.
Lewell XQ, Judd DB, Watson SP, Hann MM. RECAP—retrosynthetic
combinatorial analysis procedure: a powerful new technique for identifying privileged molecular fragments with useful applications in combinatorial chemistry. J Chem Inf Comput Sci 1998; 38: 511–522.
Bailey DS, Dean PM. Pharmacogenomics and its impact on drug
design and optimisation. Ann Rep Med Chem 1999; 34: 339–348.
Zanders ED. Gene expression analysis as an aid to the identification of
drug targets. Pharmacogenomics 2000; 1: 375–384.
Kliewer SA, Lehmann JM, Willson TM. Orphan nuclear receptors: shifting endocrinology into reverse. Science 1999; 284: 757–760.
Willson TM, Brown PJ, Sternbach DD, Henke BR. The PPARs: from
orphan receptors to drug discovery. J Med Chem 2000; 43: 527–550.
Nolte RT, Wisely GB, Westin S, Cobb JE, Lambert MH, Kurokawa R et
al. Ligand binding and co-activator assembly of the peroxisome proliferator-activated receptor-gamma. Nature 1998; 395: 137–143.
Gupta RA, Tan J, Krause WF, Geraci MW, Willson TM, Dey SK et al.
Prostacyclin-mediated activation of peroxisome proliferator-activated
receptor ␦ in colorectal cancer. Proc Natl Acad Sci USA 2000; 97:
13275–13280.
Guerre-Millo M, Gervois P, Raspe E, Madsen L, Poulain P, Derudas B et
al. Peroxisome proliferator-activated receptor alpha activators improve
insulin sensitivity and reduce adiposity. J Biol Chem 2000; 275:
16638–16642.
Steppan CM, Bailey ST, Bhat S, Brown EJ, Banerjee RR, Wright CM et
al. The hormone resistin links obesity to diabetes. Nature 2001; 409:
307–312.
Levitzki A. Protein tyrosine kinase inhibitors as novel therapeutic
agents. Pharmacol Ther 1999; 82: 231–239.
Roberts CJ, Nelson B, Marton MJ, Stoughton RS, Meyer MR, Bennett
HA et al. Signaling and circuitry of multiple MAPK pathways revealed
by a matrix of global gene expression profiles. Science 2000; 287:
873–880.
Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour
CD. Functional discovery via a compendium of expression profiles. Cell
2000; 102: 109–126.
Weinstein JN, Myers TG, O’Connor PM, Friend SH, Fornace AJ, Kohn
KW et al. An information-intensive approach to the molecular pharmacology of cancer. Science 1997; 275: 343–349.
Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L et al. A
gene expression database for the molecular pharmacology of cancer.
Nature Genet 2000; 24: 236–244.
www.nature.com/tpj