Download Reliscript User Guide

Document related concepts
no text concepts found
Transcript
About this User Guide
This user guide is a practical guide to using Reliscript, a command-line interface which allows access
to PDB data and Relibase+ search methods from within the Python scripting language environment. It
includes information on how to access data from the Relibase+ database and provides example scripts
and tutorial scripts to illustrate how to set up a search.
Use the < and > navigational buttons above to move between pages of the user guide and the TOC and
Index buttons to access the full table of contents and index. Additional on-line Relibase+ resources
can be accessed by clicking on the links on the right hand side of any page.
A set of tutorials is available for Reliscript. Tutorials can be accessed by clicking on the Tutorials link
on the right hand side of any page.
1
How to Use This Manual
If you are completely new to Reliscript and Relibase+, start by reading the following sections in the
order given:
2 Basic Introduction (see page 2)
3 Reliscript Overview (see page 6)
If you already know about Relibase+, you can get familiar with Reliscript by doing the tutorial (see
Appendix C: Reliscript Tutorials, page 101). Alternatively, just read through some of the example
scripts:
9
Example Scripts (see page 75)
If you have already used Reliscript and want to look up particular details of objects and functions, the
key reference sections are:
4
5
6
7
Accessing Protein Data: Data Objects (see page 23)
Storing and Manipulating Collections of Objects: Container Objects (see page 54)
Doing Searches and Other Calculations: Operation Objects (see page 61)
Global Utility Functions (see page 72)
If you are an advanced Reliscript user and want to extend the functionality of the language, e.g. by
writing your own objects for searching Relibase+ data, the key section is:
8 Extending the Functionality of Reliscript (see page 73)
Reliscript User Guide
1
2
Basic Introduction
2.1 What is Relibase+?
Relibase+ (http://www.ccdc.cam.ac.uk/products/life_sciences/relibase/) is a tool for searching and
analysing protein-ligand structures. It features:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
A browser-based graphical user interface
A fast database-search engine
3D visualisation using AstexViewer (embedded) or Hermes (client)
Local installation for confidential searching
The ability to search both the PDB and proprietary databases of protein-ligand complexes
Text searching
2D substructure searching
3D substructure searching
3D searching for protein-ligand interactions
Similarity searching for ligands
Sequence searching
Automatic superimposing of related binding sites
Logical combination of hitlists
Exploration of protein crystal packing
A water structure information module containing detailed information about the water structure
in each entry
• A cavity information module for detecting similarities (unexpected or otherwise) amongst
protein cavities (e.g. active sites) that share little or no sequence homology
• A secondary structure information module for searching and displaying secondary structure
2.2 What is Reliscript?
Reliscript is a command-line interface to Relibase+ (see Section 2.1, page 2). It allows access to the
Relibase+ enhanced PDB data and search methods from within the Python scripting language
environment (see Section 2.3, page 3). It can be used to construct more complex queries than are
available through the Relibase+ web-browser interface. Hits from Reliscript searches can be saved as
Relibase+ hitlists for subsequent viewing in the Relibase+ web interface. Conversely, hitlists from
Relibase+ searches can be read into Reliscript for further manipulation. Reliscript can be used in
conjunction with many other libraries and applications using powerful interface facilities provided by
Python (see Section 2.3, page 3).
The Reliscript Overview (see Section 3, page 6) gives more details of the objects, functions and mode
2
Reliscript User Guide
of use of Reliscript.
2.3 What is Python?
The following is quoted from the Python web site (http://www.python.org):
Python is an interpreted, interactive, object-oriented programming language. It is often compared to
Tcl, Perl, Scheme or Java.
Python combines remarkable power with very clear syntax. It has modules, classes, exceptions, very
high level dynamic data types, and dynamic typing. There are interfaces to many system calls and
libraries, as well as to various windowing systems (X11, Motif, Tk, Mac, MFC). New built-in modules
are easily written in C or C++. Python is also usable as an extension language for applications that
need a programmable interface.
The Appendix A: Glossary (see page 77) includes, amongst other things, a brief overview of basic
Python features and terminology. Beyond that, an excellent Python tutorial can be found at the
following web address:
• http://www.python.org/doc/current/tut/tut.html
2.4 Quick Python Primer
The following is intended as a quick primer to Python. Open up an interactive session by typing
python at the operating system command-line prompt.
The code below illustrates how to create a simple ’Hello world’ program in Python.
>>> print ’Hello world’
Hello world
The code below illustrates the data types integer, float and string.
>>> a = 1 # an integer
>>> b = 3 # another integer
>>> a + b
4
>>> a / b # Careful! integer division!
0
>>> c = 3.0 # a float
>>> a / c
0.33333333333333331
Reliscript User Guide
3
>>> d = ’string’
>>> d[0] # access first letter (as if it was a list, see below)
s
>>> e = "another string"
>>> d + e
'stringanother string'
>>> print a, c, e # note that whitespaces are added automatically
1 3.0 a string
The code below illustrate the data structures list, tuple and dictionary.
>>> l = [1, 3.3, ’t’] # a list can hold different data types
>>> l.append(’s’) # append another value to the list
>>> l[0] # first value
1
>>> l[-1] # last value
’s’
>>> l[0] = 4.5 # change first value from 1 to 4.5
>>> for value in l: print value
...
4.5
3.3
t
s
>>>
>>> t = (1, 3.3, ’t’) # a tuple can hold different data types
>>> t[0]
1
>>> t[0] = 4.5 # Error! tuples are not mutable
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object does not support item assignment
>>>
>>> d = {’a’: 1, ’b’: 2.2} # a dictionary
>>> d[’c’] = ’t’ # update/append value of entry ’c’
>>> d[’a’]
1
>>> d[’a’] = 3.3
>>> for key, value in d.iteritems(): print key, value
a 3.3
c t
4
Reliscript User Guide
b 2.2
See the script python_primer.py for a quick introduction to functions, for loops, writing to and
reading from files. Please note that Python is very sensitive to indentation, as that is how it determines
where functions, for loops and conditional statements begin and end. It is therefore a good habit to
indent with white spaces (preferably four per indentation) instead of tabs.
2.5 A Simple Reliscript Example
A very simple Reliscript example is as follows:
import reliscript
pdb_set = reliscript.set(’pdb’)
ser_search = reliscript.text_search(’SERINE’, field=’header’)
ser_search(pdb_set)
pdb_set.save_to_hitlist(’serine search’)
This searches all PDB entries to find those containing the string SERINE in the header record. The
resulting list of entries is stored persistently (i.e. on disk) as the Relibase+ hitlist serine search and
hence can be viewed in the Relibase+ web-browser interface. A more detailed explanation of this
script is given elsewhere (see Section 3.2, page 12).
Reliscript User Guide
5
3
Reliscript Overview
Reliscript is based on Python (see Section 2.3, page 3), which is an object-oriented language.
Consequently, Reliscript itself is object-oriented. Although it provides some global utility functions
(see Section 7, page 72), virtually all data-access and search-and-analysis functionality is presented in
the form of objects. Doing things in Reliscript basically involves creating the appropriate objects,
manipulating them as desired, and writing out information contained in the manipulated objects.
3.1 Running Reliscript and Writing and Debugging Scripts
Python in general, and Reliscript in particular, can be used either interactively or by running a Python
(.py) script file in batch mode. You will probably want to run Reliscript interactively, using a small
test hitlist, when writing and debugging a new script (see Section 3.1.8, page 11). Reliscript jobs on
the full PDB database can take quite a long time to run (several hours), so once a script has been
debugged, it is usually better to run it in batch mode to get the final results.
3.1.1 Setting up the Reliscript Environment
In order to be able to run reliscript one first has to set up a reliscript environment. Start by moving
into the relibase install directory:
cd <Relibase install directory>
Source the script relibase.setup.sh in the bin directory:
# for sh/bash users:
. bin/relibase.setup.sh
# for csh/tcsh users:
source bin/relibase.setup
This sets up a number of R+ environment variables and defines the setup_reliscript alias. Run the
setup_reliscript command:
setup_reliscript
This sets up environment variables required for reliscript.
3.1.2 Setting up a Reliscript Client
If you intend to make use of Reliscript, it is worth considering creating a Reliscript client installation
on a different machine to your Relibase server. This has the advantage of keeping the (memory and
6
Reliscript User Guide
CPU hungry) python process that reliscript uses, away from other Relibase processes such as the
main server, and the database search engine. Further, this also means that any potential users do not
need to be able login to the Relibase server.
In order to be able to use a Reliscript client the database needs to have its security relaxed in order
that remote Reliscript clients may connect. To do this, edit the $RELIBASE_ROOT/derby/
derby.properties file on the server, and remove the '#' from the front of the first line so it reads:
derby.drda.host=0.0.0.0
For this to take effect the database needs to be restarted:
relibase -database stop
relibase -database start
To create the Reliscript client login to the Relibase server and move into the Relibase root directory:
cd $RELIBASE_ROOT
Read the documentation of the reliscript_client.sh script:
./bin/reliscript_client.sh -h
Run the reliscript_client.sh script:
./bin/reliscript_client.sh
This creates a file called reliscript_client.tar.gz. Copy this file to the target system.
Login to the target system and unpack the reliscript_client.tar.gz file:
tar -zxvf reliscript_client.tar.gz
Move into the reliscript_client directory:
cd reliscript_client
Read the README file:
cat README
To setup the reliscript client environment run the commands:
Reliscript User Guide
7
env RELIBASE_ROOT=$PWD bin/update_config.sh -reliscript
# For bash users
. bin/relibase.setup.sh
# For csh/tsch users
source bin/relibase.setup
setup_reliscript
The Reliscript client is now installed and python reliscripts can be executed on the target machine.
The first time the Reliscript client is installed it is worth creating a fast lookup table (see Section
3.1.5, page 9):
python python/reliscript/create_fast_lookup.py
3.1.3 Starting Python in Interactive Mode
To run interactively, type python at the operating system command-line prompt (or, depending on
your installation, you may need to type an alias instead):
bash-3.1.17$ python
Python 2.5.2 (r252:60911, Jul 23 2008, 17:11:49)
[GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-59)] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>>
You must then import the Reliscript module by entering the command import reliscript (see
Section 3.1.4, page 8); Reliscript commands may then be typed and executed.
Typing the following:
python -i my_script.py
at the Unix prompt will open the Python interpreter, run the commands in my_script.py, and
leave you in interactive mode as if you had just typed the commands manually.
3.1.4 Importing the Reliscript Module and Other Python Modules
To import the Reliscript module, enter the command import reliscript. This will produce
8
Reliscript User Guide
output such as:
>>> import reliscript
Starting the JVM
-Xms128m
-Xmx512m
-Xmn64m
Imported psyco for python speed optimization
>>>
The command sets up a “namespace” called reliscript which allows access to all the Reliscript
functionality, e.g. commands such as the following may then be executed:
import reliscript
reliscript.use_workspace('fred')
lig_set = reliscript.set('ligand')
The namespace can be aliased to something else, e.g.:
import reliscript as rs
rs.use_workspace('fred')
lig_set = rs.set('ligand')
Other Python modules (see page 82) may be similarly loaded, e.g.
import re
will load the Python module for handling regular expressions.
3.1.5 Reliscript Fast Lookup Script
When Reliscript is started it needs to create an internal fast lookup table that is used when obtaining
one type of Reliscript object from another, for example, getting the ligand objects associated with a
PDB object. Creating this fast lookup information can take quite a while.
To speed up Reliscript startup and also reduce its memory requirements, it is possible to precalculate
this lookup information.
To do this, go to the location of the reliscript.py file in the Reliscript hierarchy and locate the file
create_fast_lookup.py. Execute this Python script using the same version of Python as used for
normal Reliscript.
Reliscript User Guide
9
This script duplicates the fast lookup creation in Reliscript but saves the information to a Python file
so it can be used in preference to recalculating the lookup information. The name of the file created
will be of the form:
reliscript_fast_lookup_xxxxxxxxxx.py
where x will be numeric digits. These digits represent the total size of all the Relibase+ databases.
This total size is used by reliscript.py to check that the correct fast lookup information is available
and can be loaded. When you add updates provided by the CCDC, update your own in-house
databases or change which set of database files are to be used. You will need to re-run the
create_fast_lookup.py script to create a lookup file to match the new database configuration.
3.1.6 Using Alternative Databases
By default all databases are used by Reliscript, i.e. the PDB database reli as well as any in-house
databases. To use an in-house database only one first needs to create a fast lookup table for that
database:
create_fast_lookup.py mydb1
The database can then be set using the command:
>>> reliscript.set_database(’mydb1’)
To use multiple in-house databases one first needs to generate a fast lookup table for the databases of
interest:
create_fast_lookup.py mydb1:mydb2
The databases of interest can then be set using the command:
>>> reliscript.set_database(’mydb1:mydb2’)
3.1.7 Useful Interactive Aids; Browsing and Autocompletion
When using Reliscript interactively, the up and down arrow keys can be used to browse the command
history list. In addition the <TAB> key can be used to auto-complete a command. For example:
>>> consensus_search = reliscript.con<TAB pressed>
produces:
10
Reliscript User Guide
>>> consensus_search = reliscript.consensus_search
and:
>>> pdb_1sq4 = reliscript.create(’1qs4’)
>>> print pdb_1sq4.<TAB pressed>
lists all the attributes of onesq4, while:
>>> print pdb_1sq4.a<TAB pressed>
lists all the attributes starting with the letter "a".
Note: This feature should be activated by default in the Reliscript Python initialization file, identified
by the PYTHONSTARTUP environment variable in $RELIBASE_ROOT/python/
reliscript/reliscript.setup. In order to deactivate this feature comment out the
appropriate line.
3.1.8 Creating and Loading Hit Lists
It is easy to create and load hitlists representing small subsets of the entire database. This is useful
when writing and debugging scripts, since calculations on the full database can take several hours.
The command for loading a PDB hitlist is:
pdb_set = reliscript.set(’pdb’, ’name_of_hitlist’)
Similarly, hitlists can be created from sets:
pdb_set.save_to_hitlist(’name_of_hitlist’)
The command above will raise an exception if the name of the hitlist is already used. A PDB set of
interest can be created by performing a search (see Section 6, page 61). Alternatively, a set of PDB
codes can be read in from a text file where each line contains one PDB code, using the script
create_hitlist_from_text_file.py.
Note that hitlists created using the Relibase+ GUI can also be loaded in Reliscript.
3.1.9 Running Reliscript Jobs in Batch Mode
A Reliscript Python script called myscript.py can be run in the foreground by typing the
Reliscript User Guide
11
following at the operating system command-line prompt:
python myscript.py
(You may need to replace python by an alias if one has been set in your local installation.) To run in
background mode, type:
python myscript.py > myscript.out &
Your Python script must import the reliscript module in order to use Reliscript functionality
(see Section 3.1.4, page 8).
3.2 Walking Through a Simple Reliscript Example
Consider the following script (lines beginning in # are comments):
import reliscript
pdb_set = reliscript.set(’pdb’)
ser_search = reliscript.text_search(’SERINE’, field=’header’)
ser_search(pdb_set)
pdb_set.save_to_hitlist(’serine search’)
Line 1:
Imports the reliscript Python module (see Section 3.1.4, page 8), which provides access to all
the classes and functions in Reliscript.
Line 2:
The reliscript.set command creates a set object (called, in this case, pdb_set), which is a
container for holding data objects (see Section 5.2, page 55). Specifically, the argument ’pdb’
instructs Reliscript to create a set containing all the PDB entries in the Relibase+ database. Each PDB
entry will be held as a PDB data object (see Section 4.1, page 23). For example (though it is not
relevant for the above script) pdb_set[0] (the first object in the pdb_set container) would be the
object holding the data for the first PDB entry in the database (index numbers in Python begin at zero,
not one).
Line 3:
The reliscript.text_search command creates an operation object (called, in this case,
ser_search) for performing a text search. The arguments specify that the operation object is to
search for the text string SERINE in the header data of PDB objects (equivalent to the HEADER
record in a pdb file). There are several other types of operation objects (see Section 6, page 61), e.g.
reliscript.sequence_search would create an operation object for searching protein
sequences.
12
Reliscript User Guide
Line 4:
This line applies the operation object ser_search to the set object pdb_set. It could equally
well be written as:
pdb_set(sersearch)
The practical outcome of the command is that each PDB object in pdb_set is subjected to the
search defined in the ser_search operation object, i.e. the HEADER record is searched for the
presence of the string SERINE. Only those PDB objects passing this test will be retained in the set
pdb_set; the others will be eliminated.
Line 5:
pdb_set now contains all PDB entries in the Relibase+ database that contain SERINE in the
HEADER record. The command pdb_set.save_to_hitlist calls the save_to_hitlist
function, which is a function available to any set object. In this example, it writes out the hits from the
search as a Relibase+ hitlist called serine search. This hitlist may then be loaded into Relibase+
for viewing (see Section 3.7.3, page 18).
3.3 Introduction to Objects in Reliscript
Objects in Reliscript fall into three main categories:
• Data objects, which provide access to all the data that you would expect to find in a database of
3D protein structures, e.g atomic coordinates, experimental conditions, ligand chemical
structures, chain sequences, etc. (see Section 3.3.1, page 13).
• Container objects, for storing collections of data objects, e.g. the hits from a search (see Section
3.3.2, page 14).
• Operation objects, for performing searches, superimpositions, etc. (see Section 3.3.3, page 15).
3.3.1 Introduction to Data Objects
Available data objects are PDB, Chain, NucleicAcid, Ligand, Solvent, Residue, Atom, Bond,
BindingSite and PackBindingSite (see Section 4, page 23). The PDB object is the most important,
since it contains within it objects of the other types (i.e. a PDB object will contain Chain,
NucleicAcid, Ligand and Solvent objects).
Data objects allow access to all the data stored in Relibase+. This access is either provided directly
via attributes (see page 77) of the data object or indirectly by providing access to other, related data
objects that themselves have access to the required information. For example, a PDB object would
provide direct access to the temperature of the structure determination, which is an attribute of the
Reliscript User Guide
13
PDB class:
# Load structure using the PDB code
pdb_object = reliscript.create('1xp0')
# Access the temperature
temperature = pdb_object.temp
However, access to the compound name of a ligand contained within the PDB entry would require the
Ligand object to be created, and it would be this latter object that would provide access to the name:
# Access a list of all the ligands
ligand_list = pdb_object.ligands
# Access the first ligand
ligand_object = ligand_list[0]
# Access the ligand name
ligand_name = ligand_object.compound_name
The values of some data-object attributes can be over-ridden by the user; the most obvious case where
this would be useful is if you wished to overwrite the default (Sybyl) atom type of an atom with some
other atom type.
3.3.2 Introduction to Container Objects
A container object is an object that holds within it other objects: for example, a list of Ligand objects
(list = container object, Ligand = data object). Python provides several types of containers, such as
tuples (see page 83), lists (see page 80) and dictionaries (see page 78), and Reliscript makes extensive
use of these for returning data. Lists are in some programming languages referred to as arrays and
dictionaries are sometimes called “associative arrays” or “hashes”.
For more information on how to interact with the different containers of reliscript objects see the
script container_example.py.
There is also a Reliscript-specific container object called a set which provides extra functionality (see
Section 5.2, page 55). Only some types of data objects can be stored in sets (viz. PDB, Chain,
NucleicAcid, Ligand, Solvent). A set containing one type of object can be converted to a set
containing another type of object, e.g. a set of PDB objects can be transformed to a set of Ligand
objects:
# Create a set containing all PDB objects in database
14
Reliscript User Guide
pdb_set = reliscript.set('pdb')
# Create a set containing all ligands in the PDB entries
# in pdb_set
lig_set = reliscript.set('ligand', pdb_set)
The main use of sets is to store and manipulate collections of data objects on which some sort of
search or geometrical transformation is to be applied using an operation object (see Section 3.3.3,
page 15).
3.3.3 Introduction to Operation Objects
These objects (see Section 6, page 61) are used to perform tests or geometrical transformations on
data objects such as protein chains or ligands. There are different types of operation objects for, e.g.:
• performing text searches;
• performing substructure searches using a SMILES string;
• superimposing protein chains.
It is also possible to write customised operation objects for performing specialist tasks (see Section 8,
page 73).
An operation object is applied to a set of data objects. The result is the same container holding a
modified collection of data objects, e.g. data objects satisfying a particular test. In addition, the
operation object may add extra attributes (see page 77) to the data objects that it processes, e.g.
# Create a set containing all ligands in database(s)
lig_set = reliscript.set('ligand')
# Print number of ligands in lig_set (len returns “length” of
# set, i.e. number of items it contains)
print len(lig_set)
26544
# Create an operation object called sim_lig_search to do a
# similar-ligand search, comparing each ligand in lig_set
# with a previously-created ligand object called a_ligand_obj
sim_lig_search = reliscript.similar_ligand_search(a_ligand_obj)
# Apply operation object to ligand set
sim_lig_search(lig_set)
Reliscript User Guide
15
# lig_set now contains only those ligands whose similarity
# to the reference ligand exceeds the default threshold value
print len(lig_set)
242
# The sim_lig_search operation object added a new attribute to
# each ligand, viz. its similarity with the reference ligand
print lig_set[0].ligand_similarity['value']
0.95
Operation objects can be applied in two ways. Specifically, if search_object is an operation
object for performing a particular search and a_set is a set of data objects, then both of the
following will produce identical results:
search_object(a_set)
and
a_set(search_object)
In both cases, a_set will end up containing just those data objects that satisfy the search defined by
search_object.
3.4 Introduction to Functions in Reliscript
Functions in Reliscript fall into two categories:
• Functions that are global, i.e. available throughout Reliscript (see Section 7, page 72), e.g.
d = reliscript.distance(atom1, atom2)
(computes distance between two atoms, or minimum distance between two groups of atoms).
• Functions that are only available to particular types of objects, e.g.
atom_list = pdbobject.pdb_atoms(include_pack=0)
pdb_atoms is a function available to PDB objects that writes atoms as a list of string objects.
Most object-specific functions are for writing out structural data (see Section 3.7.4, page 19) or for
applying or clearing geometrical transformations (see Section 3.5, page 17).
16
Reliscript User Guide
3.5 Geometrical Transformations
The chain superimposition object (see Section 6.7, page 71) can be used to superimpose each of a set
of protein chains onto a reference chain. This implicitly involves applying a rotation-translation
operation to each chain. If chains are transformed in this way, then the same transformation will
automatically be applied to ligands retrieved by use of the adjacent_ligands attribute of Chain
objects (see Section 4.2.3, page 28). However, other objects (e.g. the solvent around the transformed
chain) will not have the geometric transformation applied by default when they are retrieved from the
Relibase+ database. A function (transform) is therefore provided which will allow the
transformation to be applied explicitly. More details can be found in the sections on individual data
objects (see Section 4, page 23).
The function clear_transform can be used to re-set objects back to their original orientation
(i.e. as stored in the Relibase+ database).
3.6 Search Databases and Database Identifiers
Relibase+ and Reliscript gives access to structural data of protein-ligand complexes in the Protein
Data Bank. This data is stored in the database named reli. The entries derived from the reli database is
referred to by the database identifier pdb. In addition, Relibase+ system administrators can set up inhouse databases containing proprietary protein structures. Each such database will be assigned its
own identifier, e.g. mydb. These identifiers are used, e.g. when printing out Reliscript objects; for
example, printing out a Chain object might result in output like:
Chain<pdb:1a01:A>
or
Chain<mydb:1a01:A>
depending which database the chain came from. By default, Relibase+ and Reliscript will access and
search all available databases, i.e. reli and any in-house databases.
3.7 Writing Output from Reliscript
Reliscript provides facilities for printing results to standard output or a file, transferring data to and
from Relibase+, and writing structure files.
3.7.1 Printing to Standard Output
Results from a Python job can be printed to standard output, e.g.
Reliscript User Guide
17
print pdb_object.year
1997
3.7.2 Opening Files for Saving Results
Reliscript data objects provide functions for exporting structural data in a number of formats (see
Section 3.7.4, page 19). In addition, Python itself provides options for opening files to which results
may then be written, e.g.
# Open file for writing
out_file = open(’tmp_out.txt’, ’w’)
# Write something to file
out_file.write(’something’)
# Close file
out_file.close()
See the script ouput_example.py for more information on how to write a pdb object to a file.
3.7.3 Communicating with the Relibase+ Graphical User Interface
The key method for communicating results between Reliscript and Relibase+ (e.g. to view hits from
Reliscript jobs in 3D) is to convert Reliscript sets (see Section 5.2, page 55) to Relibase+ hitlists, or
vice versa. PDB and ligand sets can be saved as Relibase+ hitlists by commands such as:
pdb_set.save_to_hitlist(’my_search’)
(see Section 5.2.4, page 56).
Hitlists saved in a Relibase+ session can be read into Reliscript as sets by commands such as:
hit_list_lig_set = reliscript.set(’ligand’, ’my_hitlistname’)
(see Section 5.2.2, page 55).
If the type specified for the set (e.g. ‘ligand’ above) is not the same as the type of the hitlist, an
automatic conversion will occur. For example, if ‘my_hitlistname’ in the example above is a
PDB hitlist, the resulting Reliscript set will contain all the Ligand objects in the PDB entries
contained in the hitlist.
18
Reliscript User Guide
3.7.4 Exporting Structural Data
Most types of data objects have functions that enable the structural data they contain (atom
coordinates, etc.) to be written out to file or as a list of string objects. Availability of these functions is
as follows:
Reliscript User Guide
19
20
pdb_atoms
(returns ATOM
records as string
objects)
pdb_line
(returns one
ATOM record as
a string object)
save_pdb
(writes object in
pdb format)
save_mol2
(writes object in
mol2 format)
PDB
(see Section 4.1.4,
page 26)
yes
no
yes
no
Chain
(see Section 4.2.4,
page 29)
yes
no
yes
no
NucleicAcid
(see Section 4.3.4,
page 32)
yes
no
yes
no
Ligand
(see Section 4.4.4,
page 36)
yes
no
yes
yes
Solvent
(see Section 4.5.4,
page 39)
yes
no
yes
no
Residue
(see Section 4.6.4,
page 42)
yes
no
yes
no
Atom
(see Section 4.7.4,
page 46)
no
yes
no
no
BindingSite
(see Section 4.9.4,
page 50)
yes
no
yes
yes
PackBindingSite
(see Section
4.10.4, page 53)
yes
no
yes
no
Reliscript User Guide
If a data object has been modified in a Reliscript job (e.g. had an attribute added or changed, or been
subjected to a geometrical transformation), the data that will be written out by save_pdb will, by
default, be that of the modified object. The data corresponding to the original object, as retrieved from
the Relibase+ database, can usually be written out by setting modified = 0 in the save_pdb
parameter list (this facility is not available for a small number of data objects).
Reliscript User Guide
21
22
Reliscript User Guide
4
Accessing Protein Data: Data Objects
The above data objects (see Section 3.3.1, page 13) are available.
4.1 PDB Objects
A PDB object holds information about a complete PDB entry (or an entry in an in-house database of
protein-ligand structures), e.g. author names, the experimental conditions of the structure
determination (if available), etc. It also allows access to the 3D results of the structure determination
by returning lists of the Chain, NucleicAcid, Ligand, Solvent and Atom objects that it contains.
4.1.1 Creation of PDB Objects
In most cases, PDB objects will be created as members of a container object (see Section 5, page 54),
e.g.
# Create a set containing all PDB objects in database
# and print first member
pdb_set = reliscript.set(’pdb’)
print pdb_set[0]
PDB<pdb:1a01>
It is also possible to create a particular, individual PDB object, e.g.
# Create a PDB object for Protein Data Bank entry 1A01
pdb_object = reliscript.create(’1A01’)
# Create a PDB object for the entry 1XYZ in the in-house
# database DBID
pdb_object = reliscript.create(’DBID:1XYZ’)
Note that the entries in the reli database (entries from the Protein Data Bank) are stored as lower case.
The PDB code argument of the create function is case insensitive on these entries. However, the
PDB code argument of the create function is case sensitive on any entries from in-house databases.
4.1.2 Textual Representation of PDB Objects
Print operations, etc., on PDB objects (e.g. print pdb_object) will produce output such as:
PDB<pdb:1a01>
Colons separate the contents of the angle brackets into components. The first component is the
Reliscript User Guide
23
database identifier (see Section 3.6, page 17), followed by the PDB code. If the PDB object is from an
in-house database (i.e. not derived from the main Protein Data Bank) the output will be, e.g.:
PDB<dbid:1ax1>
where dbid is the identifier of the in-house database.
4.1.3 Attributes of PDB Objects
The attributes (see page 77) of a PDB object are:
Name
Type
Description
a
float
Cell length a, in Å.
alpha
float
Cell angle alpha, in degrees.
atoms
list of Atoms
List of Atom objects, one for each atom in the entry, in the
order: ligand atoms, chain atoms, solvent atoms.
author
string
Author field as a single string containing all authors delimited by commas or spaces, e.g.
P.J.B.PEREIRA,A.BERGNER,S.MACEDORIBEIRO,R.HUBER
authors
list of strings
List of authors, each author stored as a separate string, e.g.
['P.J.B.PEREIRA', 'A.BERGNER', 'S.MACEDO-RIBEIRO',
'R.HUBER']
b
float
Cell length b, in Å.
beta
float
Cell angle beta, in degrees.
binding_sites
list of BindingSites
List of BindingSite objects, one for each bound ligand.
bonds
list of Bonds
List of Bond objects, one for each bond in the entry.
c
float
Cell length c, in Å.
chains
list of Chains
List of Chain objects, one for each unique protein chain in
the entry.
compound
string
Contents of the PDB compound record (COMPND)
24
Reliscript User Guide
crystal
dictionary
Dictionary containing crystallographic information (this
information is also accessible via other attributes). Dictionary is of the form, e.g.
{'space_group': 'P41', 'z_value': 16,
'cell': (82.93, 82.93, 172.86),
'angles': (90.0, 90.0, 90.0)}
where 'cell' refers to the a, b and c cell lengths and 'angles'
refers to the alpha, beta and gamma cell angles. Cell lengths
of 1.0, 1.0, 1.0 and angles of 90.0, 90.0, 90.0 will be
returned for structures determined by NMR.
date
string
Deposition date as stored in the PDB file, e.g. 12. 3. 97
exptl_method
string
String describing the method used to determine the protein
structure, as taken from the EXPDTA record of the PDB
file, e.g. X-RAY DIFFRACTION
gamma
float
Cell angle gamma, in degrees.
header
string
Contents of the PDB header record, e.g.
SERINE PROTEINASE
ligands
list of Ligands List of Ligand objects, one for each bound ligand.
nucleic_acids
list of NucleicAcids
List of NucleicAcid objects, one for each unique nucleic
acid chain in the entry.
pack_binding_sites list of PackBindingSites
List of PackBindingSite objects, one for each bound ligand.
ph
float
pH value (returned as -1.0 if no pH value available).
r_value
float
Crystallographic R value (returned as -1.0 if no R-value
available).
resolution
float
Crystallographic resolution, in Å (returned as -1.0 if no resolution value available).
solvent
list containing
one Solvent
object
List containing one Solvent object. This single object will
contain information on all the solvent atoms in the entry.
Relibase+ solvent data refers only to water molecules.
Reliscript User Guide
25
source
string
Contents of the source field, e.g.
MOL_ID: 1 ORGANISM_SCIENTIFIC: HOMO SAPIENS
ORGANISM_COMMON: HUMAN
ORGAN: LUNG
CELL: MAST CELL
space_group
string
Crystallographic space group, e.g. P41
temp
float
Temperature of the study (Kelvin; returned as -1.0 if no temperature available).
title
string
Contents of the title field, e.g.
HUMAN BETA-TRYPTASE: A RING-LIKE TETRAMER
WITH ACTIVE SITES FACING A CENTRAL PORE
year
integer
Year of the study, e.g. 1997
z_value
integer
Number of polymeric chains in unit cell.
4.1.4 Functions of PDB Objects
The functions (see page 80) of a PDB object are:
pdb_atoms(include_pack=0)
Returns a list of string objects containing the ATOM records of the PDB entry.
Arguments:
• include_pack (integer): By default (i.e. if pdb_atoms() is called with no argument), the
list will not contain atoms generated by crystallographic symmetry, i.e. the atoms in the
PackBindingSite objects (see Section 4.10, page 51) that would be returned by the PDB
pack_binding_sites attribute. However, these can be included by passing a positive
include_pack value.
save_pdb(filename, include_pack=0, modified=1)
Saves the PDB entry as a file in pdb format.
Arguments:
• filename (string): Filename to be used for the output pdb file.
• include_pack (integer): By default, the file will not contain atoms generated by
crystallographic symmetry, i.e. the atoms in the PackBindingSite objects (see Section 4.10, page
51) that would be returned by the PDB pack_binding_sites attribute. However, these can
be included by passing a positive include_pack value.
26
Reliscript User Guide
• modified (integer): By default, the output file will include any changes (geometrical
transformations, changed or added attributes) that may have been made to the object by
Reliscript. To write out the original, unmodified object, set modified=0.
transform(object)
Applies the same (rotation + translation) geometrical transformation to the PDB object as has already
been applied to the object passed in as an argument.
Arguments:
• object (Reliscript data object): For example, this could be a Chain object that has been
subjected to a geometrical transformation in order to superimpose it on another chain.
clear_transform()
Clears any geometrical transformation that has been applied to the PDB object so that all atom
coordinates return to their original values.
4.2 Chain Objects
A Chain object holds information about a protein chain, e.g. its sequence. It also allows access to the
Residue objects that make up the chain, the PDB object to which it belongs, etc.
4.2.1 Creation of Chain Objects
Chain objects can be created either by accessing a member of a chain container object, e.g.
# Create set containing all chains in database
chain_set = reliscript.set('chain')
# Get first member
chain_obj0 = chain_set[0]
or from a PDB object, e.g.
pdb_object = reliscript.create('1qs4')
# Get second chain in PDB entry 1qs4
chain_obj1 = pdb_object.chains[1]
These two steps may be combined in a single line, e.g.
chain_obj1 = reliscript.create('1qs4').chains[1]
Reliscript User Guide
27
4.2.2 Textual Representation of Chain Objects
Print operations, etc., on Chain objects (e.g. print chain_obj) will produce output such as:
Chain<pdb:1a01:A>
Colons separate the contents of the angle brackets into components. The first component is the
database identifier (see Section 3.6, page 17), followed by the PDB code; the final component is the
chain identifier. If the Chain object is from an in-house database (i.e. not derived from the main PDB)
the output will be, e.g.:
Chain<dbid:1ax1:A>
where dbid is the identifier of the in-house database.
4.2.3 Attributes of Chain Objects
The attributes (see page 77) of a Chain object are:
Name
Type
Description
adjacent_ligands
list of Ligands
List of ligands whose BindingSites include at least one
residue from this chain. The list is ordered by the size
of the chain/ligand interaction, i.e. when there are two
or more ligands in the list, the first will have more of
this chain’s residues involved in its binding site than
will the second. If the chain has been subjected to a
geometrical transformation, e.g. using the
superimpose_chain operation object, then all ligands in
the list will be transformed in the same way.
atoms
list of Atoms
The Atom objects in the chain.
bonds
list of Bonds
The Bond objects in the chain.
chain_id
string
Chain identifier, e.g. A (returns the single-character
string “-” if no identifier available).
n_atom
integer
Number of atoms in the chain.
n_unit
integer
Number of units (i.e. residues) in chain; equivalent to
len(chain).
pdb
PDB
The PDB object that contains the chain.
28
Reliscript User Guide
residues
list of Residues
The Residue objects in the chain. Residue index numbers in this list may not be the same as their numbers in
the protein SEQRES sequence (see Section 4.2.5, page
30).
sequence
string
String containing the amino-acid sequence as one-letter codes, e.g. IVGTRVTYLDWIHHYVPKK. The
sequence is that given in the PDB SEQRES records.
When the chain has been created from a BindingSite or
PackBindingSite object, this attribute will return an
empty string.
sequence_3d
string
String containing the amino-acid sequence as one-letter codes, e.g. IVGTRVTYLDWIHHYVPKK. The
sequence returned is that determined from the residues
in the PDB ATOM records, not the sequence defined in
PDB SEQRES records. These may differ, e.g. the
experimental sequence would not include residues
whose 3D atomic positions were not determined
because of crystallographic disorder.
type
string
String that identifies this object type. For a Chain
object, this will be the string protein_chain
4.2.4 Functions of Chain Objects
The functions (see page 80) of a Chain object are:
pdb_atoms()
Returns a list of string objects containing the ATOM records of the chain.
save_pdb(filename, modified=1)
Saves the chain as a file in pdb format.
Arguments:
• filename (string): Filename to be used for the output pdb file.
• modified (integer): By default, the output file will include any changes (geometrical
transformations, changed or added attributes) that may have been made to the object by
Reliscript. To write out the original, unmodified object, set modified=0.
transform(object, on_original=0)
Applies the same (rotation + translation) geometrical transformation to the Chain object as has
Reliscript User Guide
29
already been applied to the object passed in as an argument.
Arguments:
• object (Reliscript data object): A data object (e.g. a PDB, Chain, NucleicAcid, Ligand,
Solvent, or Residue object) that has been subjected to a geometrical transformation.
• on_original (integer): By default, the transformation will be applied to the Chain object in
its current orientation, which may already be the result of a previous transformation. To
transform the original, untransformed atomic positions, set the on_original flag to a nonzero value.
clear_transform()
Clears any geometrical transformation that has been applied to the Chain object so that all atom
coordinates return to their original values.
4.2.5 Accessing the Residues in a Chain; Residue Numbering
Chain objects have internal functions that allow them to act both like Python lists (see page 80) and
Python dictionaries (see page 78) for the purposes of accessing the Residue objects that they contain.
The script accessing_residues.py show how these access functions can be used; in each case,
the comment indicates how the equivalent access could be made by using the residues attribute
(see Section 4.2.3, page 28).
The crucial point is that, for loops and numerical indexing, the residues are considered to run from
residue 0 to residue N-1 (where N is the number of residues in the chain). Thus, the third example
would return the 5th to the 21st residues in the chain, not the 4th to the 20th. However, within a chain
residues have string labels associated with them from the PDB, such as 16, -1 or 27B; normally
(though not invariably) these will be the residue sequence numbers as used conventionally by a
protein chemist. The Chain object provides access to residues via this label by allowing the index into
the chain to be a string “key”, as in the final example above. This method of access is not possible via
the residues attribute, which is a pure Python list and therefore does not support key access.
4.3 NucleicAcid Objects
4.3.1 Creation of NucleicAcid Objects
NucleicAcid objects can be created either by accessing a member of a nucleic_acid container object,
e.g.
# Create set containing all nucleic acids in database
nucleic_acid_set = reliscript.set('nucleic_acid')
# Get first member
nucleic_acid_obj0 = nucleic_acid_set[0]
30
Reliscript User Guide
or from a PDB object, e.g.
pdb_object = reliscript.create('100d')
# Get first nucleic acid chain in PDB entry 100d
nucleic_acid_obj1 = pdb_object.nucleic_acids[0]
These two steps may be combined in a single line, e.g.
nucleic_acid_obj1 = reliscript.create('100d').nucleic_acids[1]
4.3.2 Textual Representation of NucleicAcid Objects
Print operations, etc., on NucleicAcid objects (e.g. print nucleic_acid_obj) will produce
output such as:
NucleicAcid<pdb:100d:A>
Colons separate the contents of the angle brackets into components. The first component is the
database identifier (see Section 3.6, page 17), followed by the PDB code; the final component is the
nucleic acid identifier. If the NucleicAcid object is from an in-house database (i.e. not derived from
the main PDB) the output will be, e.g.:
NucleicAcid<dbid:1xxx:A>
where dbid is the identifier of the in-house database.
4.3.3 Attributes of NucleicAcid Objects
The attributes (see page 77) of a NucleicAcid object are:
Name
Reliscript User Guide
Type
Description
31
adjacent_ligands
list of Ligands
List of ligands whose BindingSites include at least one
residue from this nucleic acid. The list is ordered by
the size of the nucleic acid/ligand interaction, i.e. when
there are two or more ligands in the list, the first will
have more of this nucleic acid’s residues involved in its
binding site than will the second. If the nucleic acid has
been subjected to a geometrical transformation then all
ligands in the list will be transformed in the same way.
atoms
list of Atoms
The Atom objects in the nucleic acid.
bonds
list of Bonds
The Bond objects in the nucleic acid.
chain_id
string
Nucleic acid chain identifier, e.g. B (returns the singlecharacter string “-” if no identifier available).
n_atom
integer
Number of atoms in the nucleic acid.
n_unit
integer
Number of units (i.e. residues) in the nucleic acid
chain; equivalent to len(nucleic_acids).
pdb
PDB
The PDB object that contains the nucleic acid.
residues
list of Residues
The Residue objects in the nucleic acid.
sequence_3d
string
String containing the nucleic acid sequence as one-letter codes, e.g. ATTAGTA. The sequence returned is that
determined from the residues in the PDB ATOM
records, not the sequence defined in PDB SEQRES
records. These may differ, e.g. the experimental
sequence would not include residues whose 3D atomic
positions were not determined because of crystallographic disorder.
type
string
String that identifies this object type. For a NucleicAcid object, this will be the string nucleic_acid
4.3.4 Functions of NucleicAcid Objects
The functions (see page 80) of a NucleicAcid object are:
pdb_atoms()
Returns a list of string objects containing the ATOM records of the nucleic acid.
32
Reliscript User Guide
save_pdb(filename, modified=1)
Saves the nucleic acid as a file in pdb format.
Arguments:
• filename (string): Filename to be used for the output pdb file.
• modified (integer): By default, the output file will include any changes (geometrical
transformations, changed or added attributes) that may have been made to the object by
Reliscript. To write out the original, unmodified object, set modified=0.
transform(object, on_original=0)
Applies the same (rotation + translation) geometrical transformation to the NucleicAcid object as has
already been applied to the object passed in as an argument.
Arguments:
• object (Reliscript data object): A data object (e.g. a PDB, Chain, NucleicAcid, Ligand,
Solvent, or Residue object) that has been subjected to a geometrical transformation.
• on_original (integer): By default, the transformation will be applied to the NucleicAcid
object in its current orientation, which may already be the result of a previous transformation. To
transform the original, untransformed atomic positions, set the on_original flag to a nonzero value.
clear_transform()
Clears any geometrical transformation that has been applied to the NucleicAcid object so that all
atom coordinates return to their original values.
4.3.5 Looping around the Contents of a NucleicAcid Object
Like Chain objects, NucleicAcid objects can simulate certain list and dictionary operations (see
Section 4.2.5, page 30). Looping around the contents of a NucleicAcid object will produce Residue
objects. For an example see the script looping_around_nucleic_acids.py.
4.4 Ligand Objects
A Ligand object holds information about a protein-bound ligand, e.g. compound name, molecular
weight. It also allows access to the binding site to which it is bound, other nearby chains generated by
crystallographic symmetry, the PDB object to which it belongs, etc. Each ligand is divided up into a
number of units. Often there will be only one unit, but some ligands - for example small peptides - are
divided into multiple units. For consistency with protein chains, each unit of a ligand is stored as a
Residue object.
4.4.1 Creation of Ligand Objects
Ligand objects can be created either by accessing a member of a ligand container object, e.g.
Reliscript User Guide
33
# Create set containing all ligands in database
ligand_set = reliscript.set('ligand')
# Get second member
ligand_obj1 = ligand_set[1]
or from a PDB object, e.g.
pdb_object = reliscript.create('1qs4')
# Get first ligand in PDB entry 1qs4
ligand_obj0 = pdb_object.ligands[0]
These two steps may be combined in a single line, e.g.
ligand_obj0 = reliscript.create('1qs4').ligands[0]
4.4.2 Textual Representation of Ligand Objects
Print operations, etc., on Ligand objects (e.g. print lig_object) will produce output such as:
Ligand<pdb:1a01:APA_301-A>
Colons separate the contents of the angle brackets into components. The first component is the
database identifier (see Section 3.6, page 17), followed by the PDB code; the final component is the
internal Relibase+ ligand identifier, which is based on the nomenclature of the ligands in the original
PDB file. If the Ligand object is from an in-house database (i.e. not derived from the main PDB) the
output will be, e.g.:
Ligand<dbid:1ax1:ALA_ARG_VAL_50>
where dbid is the identifier of the in-house database.
4.4.3 Attributes of Ligand Objects
The attributes (see page 77) of a Ligand object are:
Name
34
Type
Description
Reliscript User Guide
adjacent_chains
list of Chains
List of all chains that have at least one residue in the
ligand’s BindingSite. The list is ordered by the size
of the chain/ligand interaction, i.e. when there are
two or more chains in the list, the first will have
more of its residues involved in the ligand binding
site than will the second.
adjacent_nucleic_a
cids
list of NucleicAcids
List of all nucleic acids that have at least one residue
in the ligand’s BindingSite. The list is ordered by the
size of the nucleic acid/ligand interaction, i.e. when
there are two or more nucleic acids in the list, the
first will have more of its residues involved in the
ligand binding site than will the second.
atoms
list of Atoms
The Atom objects in the ligand.
binding_site
BindingSite
The BindingSite object associated with the ligand.
bonds
list of Bonds
The Bond objects in the ligand.
cofactor
Boolean
Returns 1 if the ligand is a cofactor or comprises
cofactor building blocks. The method is based on the
ligand full_name attribute and checks for the
following building blocks: ADP, AMP, ATP,
B12, BTN, COA, FAD, FMN, FS3, FS4,
HEM, NAD, NAP, PLP, TPP
compound_name
string
The compound name of the ligand as given in the
PDB file.
covalently_bound
Boolean
Returns 1 if the ligand is covalently bound to the
protein, otherwise 0.
full_name
string
A list of the ligand building blocks, as defined in the
PDB file, which can be one or more than one, e.g.
MQI or NAS-GLY-PAP-PIP
mol_wt
float
Molecular weight.
n_atom
integer
Number of atoms in the ligand.
n_unit
integer
Number of units (i.e. residues) in the ligand;
equivalent to len(ligand).
Reliscript User Guide
35
pack_binding_site
PackBindingSite
The PackBindingSite associated with the ligand (i.e.
nearby atoms in the crystal-packing environment).
pdb
PDB
The PDB object containing the ligand.
peptide
Boolean
Returns 1 if the ligand contains at least one natural
amino acid building block. The method checks the
ligand full_name attribute, e.g. NAS-GLY-PAPPIP returns 1.
pure_peptide
Boolean
Returns 1 if the ligand contains only natural amino
acid building blocks. The method checks the ligand
full_name attribute, e.g. GLY-ARG-PHE returns 1
residues
list of Residues
A list of the Residue objects in the ligand. The term
residue is used for consistency with Chain objects. In
reality, they are simply the component sections of
the ligand, which may or may not be peptide units.
sugar
Boolean
Returns 1 if the ligand is comprises carbohydrate
building blocks. The method is based on the ligand
full_name attribute and checks for the following
building blocks: ARA, ARB, FUC, GAL, GLU,
MAN
type
string
String that identifies this object type. For a Ligand
object, this will be the string ligand
4.4.4 Functions of Ligand Objects
The functions (see page 80) of a Ligand object are:
pdb_atoms()
Returns a list of string objects containing the ATOM records of the ligand.
save_pdb(filename, modified=1)
Saves the ligand as a file in pdb format.
Arguments:
• filename (string): Filename to be used for the output pdb file.
• modified (integer): By default, the output file will include any changes (geometrical
transformations, changed or added attributes) that may have been made to the object by
Reliscript. To write out the original, unmodified object, set modified=0.
36
Reliscript User Guide
save_mol2(filename, modified=1)
Saves the ligand as a file in mol2 format (Tripos Inc., St Louis, USA).
Arguments:
• filename (string): Filename to be used for the output mol2 file.
• modified (integer): By default, the output file will include any changes (geometrical
transformations, changed or added attributes) that may have been made to the object by
Reliscript. To write out the original, unmodified object, set modified=0.
transform(object, on_original=0)
Applies the same (rotation + translation) geometrical transformation to the Ligand object as has
already been applied to the object passed in as an argument.
Arguments:
• object (Reliscript data object): A data object (e.g. a PDB, Chain, NucleicAcid, Ligand,
Solvent, or Residue object) that has been subjected to a geometrical transformation. For
example, this might be a Chain object in the ligand’s binding site that has been rotated and
translated to superimpose it on another, similar chain.
• on_original (integer): By default, the transformation will be applied to the Ligand object in
its current orientation, which may already be the result of a previous transformation. To
transform the original, untransformed atomic positions, set the on_original flag to a nonzero value.
clear_transform()
Clears any geometrical transformation that has been applied to the Ligand object so that all atom
coordinates return to their original values.
4.4.5 Looping around the Contents of a Ligand Object
Like Chain objects, Ligand objects can simulate certain list and dictionary operations (see Section
4.2.5, page 30). Looping around the contents of a Ligand object will produce Residue objects. In
many cases there will only be one residue object in the whole ligand, but some ligands (particularly
peptides) contain several. For an example see the script looping_around_ligands.py.
4.5 Solvent Objects
A Solvent object holds information about the water molecules in a protein structure. It allows access
to the water Atom objects and to the parent PDB object. There is one Solvent object for each PDB
object. Each water molecule within a Solvent object is treated as a separate “residue”. While slightly
artificial, this helps maintain consistency with Chain and Ligand objects.
Reliscript User Guide
37
4.5.1 Creation of Solvent Objects
Solvent objects can be created either by accessing a member of a solvent container object, e.g.
# Create set containing all Solvent objects in database
solvent_set = reliscript.set('solvent')
# Get last member
solvent_obj_last = solvent_set[-1]
or from a PDB object, e.g.
pdb_object = reliscript.create('1qs4')
# Get first (and only!) Solvent object for PDB entry 1qs4
solvent_obj0 = pdb_object.solvent[0]
These two methods may be combined in a single line, e.g.
solvent_obj0 = reliscript.create('1qs4').solvent[0]
4.5.2 Textual Representation of Solvent Objects
Print operations, etc., on Solvent objects (e.g. print solv_object) will produce output such as:
Solvent<pdb:1a01:SOLV>
Colons separate the contents of the angle brackets into components.The first component is the
database identifier (see Section 3.6, page 17), followed by the PDB code; there will only be one
Solvent object per PDB entry, so the final component will always be SOLV. If the Solvent object is
from an in-house database (i.e. not derived from the main PDB) the output will be, e.g.:
Solvent<dbid:1ax1:SOLV>
where dbid is the identifier of the in-house database.
4.5.3 Attributes of Solvent Objects
The attributes (see page 77) of a Solvent object are:
Name
38
Type
Description
Reliscript User Guide
atoms
list of Atoms
The solvent Atom objects (in effect, the water oxygens in the
structure, if no hydrogen-atom coordinates are present).
bonds
list of Bonds
The solvent Bond objects (will usually be an empty list, as
solvent Atom objects will normally be disconnected water
oxygens).
n_atom
integer
Number of solvent atoms.
n_unit
integer
Number of units (i.e. residues) in solvent; equivalent to
len(solvent). As each water is treated as a separate residue,
and water hydrogen atoms are usually missing, the value of
this attribute is generally identical to that of n_atom.
pdb
PDB
The PDB object containing the solvent.
residues
list of Residues The solvent Residue objects. The term residue is used for
consistency with Chain objects. In practice, each water molecules is treated as a separate residue.
type
string
String that identifies this object type. For a Solvent object,
this will be the string solvent
4.5.4 Functions of Solvent Objects
The functions (see page 80) of a Solvent object are:
pdb_atoms()
Returns a list of string objects containing the ATOM records of the Solvent object (i.e. all solvent
atoms in the PDB entry).
save_pdb(filename, modified=1)
Saves the Solvent object as a file in pdb format.
Arguments:
• filename (string): Filename to be used for the output pdb file.
• modified (integer): By default, the output file will include any changes (geometrical
transformations, changed or added attributes) that may have been made to the object by
Reliscript. To write out the original, unmodified object, set modified=0.
transform(object, on_original=0)
Applies the same (rotation + translation) geometrical transformation to the Solvent object as has
Reliscript User Guide
39
already been applied to the object passed in as an argument.
Arguments:
• object (Reliscript data object): A data object (e.g. a PDB, Chain, NucleicAcid, Ligand,
Solvent, or Residue object) that has been subjected to a geometrical transformation.
• on_original (integer): By default, the transformation will be applied to the Solvent object in
its current orientation, which may already be the result of a previous transformation. To
transform the original, untransformed atomic positions, set the on_original flag to a nonzero value.
clear_transform()
Clears any geometrical transformation that has been applied to the Solvent object so that all atom
coordinates return to their original values.
4.5.5 Looping around the Contents of a Solvent Object
To maintain consistency with Chain and Ligand objects, there is an intermediate Residue object that
is produced either when the residues attribute is retrieved or if the object is accessed via the
supported list functions. Thus, looping around the contents of a Solvent object will produce Residue
objects, although each Residue object will usually contain just one water-oxygen atom.
The script looping_around_solvent_objects.py prints the coordinates of the oxygen
atom of all solvent residues.
4.6 Residue Objects
A Residue object holds information about a unit of a Chain, NucleicAcid, Ligand or Solvent object. It
allows access to the Atom objects it contains and the parent object to which it belongs. The term
residue is really applicable only to chains and some ligands (i.e. peptides and related compounds) but
is used throughout so that scripts can be written which will work in the same way, regardless of
whether the parent object is a Chain, NucleicAcid, Ligand or Solvent.
4.6.1 Creation of Residue Objects
Residue objects are created by requesting them from (or looping around the contents of) Chain,
NucleicAcid, Ligand or Solvent objects, or their packed equivalents (see Section 4.10.5, page 54),
e.g.
# Get first chain in PDB entry 1mmb
pdb_obj = reliscript.create('1mmb')
chain_obj = pdb_obj.chains[0]
# Get first residue in chain
40
Reliscript User Guide
res1 = chain_obj[0]
This can be done with one command:
res2 = reliscript.create('1mmb').chains[0][0]
# Now get first residue in a pack binding site chain
ligand_obj = reliscript.create('1qs4').ligands[0]
pack_bs_obj = ligand_obj.pack_binding_site
res3 = pack_bs_obj.chains[0][0]
4.6.2 Textual Representation of Residue Objects
Print operations, etc., on Residue objects (e.g. print res_object) will produce output such as:
Residue<pdb:1a01:A:’16B’>
Residue<pdb:1a01:100_1004:’1004’>
Residue<pdb:1a01:SOLV:’500’>
Colons separate the contents of the angle brackets into components. The first component is the
database identifier (see Section 3.6, page 17), followed by the PDB code; the third component is the
identifier of the object that contains the residue and the final part is the residue identifier (this is the
contents of the number field relating to that residue in the original PDB file). If the Residue object is
from an in-house database (i.e. not derived from the main PDB) the output will be, e.g.:
Residue<dbid:1a01:A:’16B’>
where dbid is the identifier of the in-house database.
4.6.3 Attributes of Residue Objects
The attributes (see page 77) of a Residue object are given in the table below.
Name
Type
Description
atoms
list of Atoms The Atom objects in the residue.
bonds
list of Bonds
Reliscript User Guide
The Bond objects in the residue, including any bonds linking this residue to other residues in the same chain or
ligand, but excluding disulphide bridges and bonds
between proteins and covalently-bound ligands.
41
chain_id
string
Chain identifier; if no chain identifier is set, this will return
the one-character string “-”.
index_no
integer
The index number of the residue in the Chain, NucleicAcid, Ligand or Solvent object to which it belongs. The
first residue in a chain would have an index_no of 0, etc.
n_atom
integer
Number of atoms in the residue. This will be the number
whose positions were determined experimentally, i.e. given
on PDB ATOM records.
n_atom_ideal
integer
For amino acid residues, the number of atoms that the residue should ideally contain. This may be different from
n_atom, e.g. if one or more atoms in the residue were not
located experimentally.
name
string
For peptidic residues, the amino-acid name, e.g. SER. For
solvent residues, returns HOH.
one_letter_cod
e
string
One letter code of the residue; for non-peptide residues,
this will return the one-character string “*”.
pdb
PDB
The PDB object containing the residue.
sequence_no
string
The residue sequence identifier, e.g. 10 or 123. For solvents, the Relibase+ internal count.
type
string
String that identifies the type of object that the residue is
part of. This will be either amino acid, nucleic acid, ligand
or solvent. If the residue is part of a peptidic ligand, the
type will be returned as amino acid rather than ligand.
4.6.4 Functions of Residue Objects
The functions (see page 80) of a Residue object are:
pdb_atoms()
Returns a list of string objects containing the ATOM records of the residue.
save_pdb(filename)
Saves the residue as a file in pdb format. If the residue has been modified in any way (e.g. subjected
to a geometric transformation), the modified data will be written out, not the original data as retrieved
from the Relibase+ database.
42
Reliscript User Guide
Arguments:
• filename (string): Filename to be used for the output pdb file.
transform(object, on_original=0)
Applies the same (rotation + translation) geometrical transformation to the Residue object as has
already been applied to the object passed in as an argument. All other atoms in the object to which the
residue belongs will also have the same transformation applied, e.g. if the residue belongs to a chain,
the whole chain will be transformed.
Arguments:
• object (Reliscript data object): A data object (e.g. a PDB, Chain, NucelicAcid, Ligand,
Solvent, or Residue object) that has been subjected to a geometrical transformation.
• on_original (integer): By default, the transformation will be applied to the Residue object in
its current orientation, which may already be the result of a previous transformation. To
transform the original, untransformed atomic positions, set the on_original flag to a nonzero value.
clear_transform()
Clears any geometrical transformation that has been applied to the Residue object so that all atom
coordinates return to their original values. All other atoms in the object to which the residue belongs
will also be returned to their original positions, e.g. if the residue is part of a chain, the whole chain
will be reset.
4.6.5 Looping around the Contents of a Residue Object
Internal functions allow the Residue object to be treated as a list, where the list contains the atoms
stored in the residue. Thus, looping around the contents of a Residue object will produce Atom
objects, for an example see the script looping_around_residue_objects.py.
4.6.6 Residue Numbering
Chain objects provide some internal functions which allow the Residue objects they contain to be
referred to by the residue identifier used in the original PDB file, which will normally be the position
of the residue in the protein SEQRES sequence, e.g.
# Retrieve residue labelled 17
res = chain_obj[’17’]
You are recommended to use this method if you wish to access particular residues in a protein
sequence. Other methods are available for accessing the residues of a chain, but they do not
necessarily use biologically meaningful numbering schemes (see Section 4.2.5, page 30).
Reliscript User Guide
43
4.7 Atom Objects
An Atom object holds information about a particular atom in a protein chain, a nucleic acid, a ligand,
or a solvent molecule (e.g. positional coordinates). It also allows access to the Residue and PDB
objects to which it belongs. Atom and Bond objects are the smallest 3D-structural components of a
PDB entry.
4.7.1 Creation of Atom Objects
Atom objects are created by requesting them from Residue objects or Bond objects, e.g.
atom0 = res_obj[0]
atom1 = bond_obj[0]
In addition, all data objects containing Residue objects (i.e. PDB, Chain, NucleicAcid, Ligand, Solvent, BindingSite, PackBindingSite, etc.) will produce a list of the atoms they contain if requested,
e.g.
# Get second atom in PDB object pdb_obj
atom1 = pdb_obj.atoms[1]
# Get last atom in Chain object chain_obj
atom2 = chain_obj.atoms[-1]
# Get first atom in BindingSite object bindingsite_obj
atom3 = bindingsite_obj.atoms[0]
4.7.2 Textual Representation of Atom Objects
Print operations, etc., on Atom objects (e.g. print atom_object) will produce output such as:
Atom(N)<pdb:1a01:A:’16B’:121>
Atom(Cl)<pdb:1a01:100_1004:’1004’:31>
Atom(O)<pdb:1a01:SOLV:’500’:500>
Colons separate the contents of the angle brackets into components. The first component is the
database identifier (see Section 3.6, page 17), followed by the PDB code; the third part is the
identifier of the object that contains the atom and the final part is the atom number from the original
PDB file. If the Atom object is from an in-house database (i.e. not derived from the main PDB) the
output will be, e.g.:
44
Reliscript User Guide
Atom(N)<dbid:1a01:A:’16B’:121>
where dbid is the identifier of the in-house database.
4.7.3 Attributes of Atom Objects
The attributes (see page 77) of an Atom object are:
Name
Type
Description
b_factor
float
Temperature (B) factor
bonds
list of Bonds
List of all bonds in which this atom is involved, including
bonds to atoms in other residues, if any such bonds exist, but
excluding bonds that correspond to covalent protein-ligand
linkages.
coords
tuple
Tuple of three floating point numbers containing the orthogonal x, y, z coordinates of the atom, e.g. (59.589, 58.943,
86.473)
element_no
integer
The elemental atomic number of the atom.
index_no
integer
The integer index number of the atom within the Chain,
NucleicAcid, Ligand or Solvent object of which it is part, i.e.
its position in the list of Atom objects produced by the
atoms attribute of the Chain, NucleicAcid, Ligand or Solvent object.
name
string
The PDB label of the atom, e.g. N, CA, CB.
pdb_atom_n
umber
integer
Number of the atom in the PDB entry of which it is part.
occupancy
float
Site occupancy.
pdb
PDB
The PDB object containing the atom.
residue
residue
The Residue object containing the atom.
symbol
string
The element symbol of the atom, e.g. C, N, Cl. This string is
included in the textual representation of the atom.
sybyl_type
string
The Sybyl atom type (see page 83) of the atom, e.g. N.2.
Returned as UNK if unknown. These may not be set reliably,
especially if the atom has an uncertain protonation state.
Reliscript User Guide
45
x
float
The atomic x Cartesian coordinate.
y
float
The atomic y Cartesian coordinate.
z
float
The atomic z Cartesian coordinate.
4.7.4 Functions of Atom Objects
The functions (see page 80) of an Atom object are:
pdb_line()
Returns (as a string) the PDB ATOM line relating to this atom.
4.8 Bond Objects
A Bond object holds information about the chemical bond between two atoms in a protein chain,
nucleic acid, ligand, or solvent molecule. Bond and Atom objects are the smallest 3D-structural
components of a PDB entry.
4.8.1 Creation of Bond Objects
Bond objects are created by requesting them from data objects such as Atom, Residue, Ligand, etc.,
e.g.
bond1 = atom_obj.bonds[1]
bond2 = residue_obj.bonds[-1]
bond3 = ligand_obj.bonds[0]
4.8.2 Textual Representation of Bond Objects
Print operations, etc., on Bond objects (e.g. print bond_object) will produce output such as:
Bond(SINGLE)<pdb:1qs4:CHN-A:(Atom(C) <'65':75>, Atom(S) <'65':76>)>
Colons separate the contents of the angle brackets into components. The first component is the
database identifier (see Section 3.6, page 17), followed by the PDB code; the third part is the
identifier of the object that contains the bond and the final part is a tuple identifying the atoms
involved in the bond. If the Bond object is from an in-house database (i.e. not derived from the main
PDB) the output will be, e.g.:
Bond(SINGLE)<dbid:1qs4:CHN-A:(Atom(C) <'65':75>, Atom(S)
<'65':76>)>
46
Reliscript User Guide
where dbid is the identifier of the in-house database.
4.8.3 Attributes of Bond Objects
The attributes (see page 77) of a Bond object are:
Name
Type
Description
atoms
list containing
two Atom objects
List containing the two Atom objects that form the bond.
bond_type
string
Bond type, one of SINGLE, DOUBLE, TRIPLE, AROMATIC, or AMIDE (i.e. amide or peptide).
4.8.4 Functions of Bond Objects
The functions (see page 80) of a Bond object are:
other_atom(atom_object)
Returns (as an Atom object) the other atom involved in the bond, assuming that atom_object
itself is involved in the bond. Returns NONE if atom_object is not involved in the bond.
Arguments:
• atom_object (Atom): An atom.
4.8.5 Looping around the Contents of a Bond Object
Internal functions allow the Bond object to behave like a Python list containing the two atoms
involved in the bond, i.e.
first_atom = bond_obj[0]
second_atom = bond_obj[1]
Looping around the contents of a Bond therefore produces Atom objects, see the script
looing_around_bond_objects.py.
Also, the contents of a Bond object can be tested, e.g.
if atom_obj in bond_obj: print ’atom is connected by bond’
Reliscript User Guide
47
4.9 BindingSite Objects
A BindingSite object holds information about the atoms surrounding a bound ligand. These atoms
may belong to protein chains, nucleic acids, solvent molecules or other ligands. There is exactly one
BindingSite object for each Ligand object, i.e. each BindingSite is defined with respect to a particular
Ligand object. A Binding Site object is similar to a PDB object in that it has the attributes chains,
nucleic_acids, solvent and ligands. In the PDB object, these refer to the contents of the
complete protein; in a BindingSite object, they refer to:
• chains: All protein chain residues that have at least one atom within 7Å of the ligand defining
the BindingSite object.
• nucleic_acids: All protein nucleic acid residues that have at least one atom within 7Å of
the ligand defining the BindingSite object.
• solvent: All solvent atoms within 7Å of the ligand defining the BindingSite object.
• ligands: All other ligands that have at least one atom within 7Å of the ligand defining the
BindingSite object.
A BindingSite object does not contain any atoms generated by crystallographic symmetry; these can
be accessed via a PackBindingSite object (see Section 4.10, page 51).
4.9.1 Creation of BindingSite Objects
A BindingSite object can only be created from the associated Ligand object or the corresponding
PackBindingSite object, e.g.
bindingsite1 = ligand_obj.binding_site
bindingsite2 = pack_bindingsite_obj.binding_site
A list of BindingSite objects is also available from the PDB object, e.g.
# Get third binding site in PDB object pdb_obj
bindingsite3 = pdb_obj.binding_sites[2]
4.9.2 Textual Representation of BindingSite Objects
Print operations, etc., on BindingSite objects (e.g. print bindingsite_object) will produce
output such as:
BindingSite<pdb:1a01:APA_301-A>
48
Reliscript User Guide
Colons separate the contents of the angle brackets into components. The first component is the
database identifier (see Section 3.6, page 17), followed by the PDB code; the final component is the
internal Relibase+ identifier of the bound ligand (see Section 4.4.2, page 34). If the BindingSite
object is from an in-house database (i.e. not derived from the main PDB) the output will be, e.g.:
BindingSite<dbid:1a01:APA_301-A>
where dbid is the identifier of the in-house database.
4.9.3 Attributes of BindingSite Objects
The attributes (see page 77) of a BindingSite object are:
Name
Type
Description
atoms
list of Atoms
The Atom objects in the binding site, not including
the atoms of the bound ligand.
bonds
list of Bonds
The Bond objects in the binding site. In the case of
chains, this includes only those bonds between
atoms in the binding site, i.e. it does not include
bonds from residues in the binding site to residues
outside the binding site.
bound_ligand
Ligand
The Ligand object of which this is the binding site.
chains
list of
BindingSiteChains
The BindingSiteChain objects in the binding site.
Each of these objects will contain only those Residue
objects that have at least one atom within 7Å of the
bound ligand.
nucleic_acids
list of
BindingSiteNucleicAcids
The BindingSiteNucleicAcid objects in the binding
site.
ligands
list of Ligands
A list of other Ligand objects (excluding the bound
ligand) that are contained in the binding site.
pack_binding_site PackBindingSite
The associated PackBindingSite object (i.e. nearby
atoms in the crystallographic environment).
pdb
The PDB object containing this binding site.
Reliscript User Guide
PDB
49
solvent
list containing one
BindingSiteSolvent
object
List containing one BindingSiteSolvent object. This
single object contains the solvent atoms in the binding site.
4.9.4 Functions of BindingSite Objects
The functions (see page 80) of a BindingSite object are:
pdb_atoms()
Returns a list of string objects containing the ATOM records of the binding site.
save_pdb(filename)
Saves the binding site as a file in pdb format. If the binding site has been modified in any way (e.g.
subjected to a geometric transformation), the modified data will be written out, not the original data as
retrieved from the Relibase+ database.
Arguments:
• filename (string): Filename to be used for the output pdb file.
save_mol2(filename, radius=7.0, modified=1, include_pack=1)
Saves the binding site as a file in mol2 format (Tripos Inc., St Louis, USA). Will not work if the
binding site has been subjected to any geometrical transformation, e.g. as a result of chain
superimposition (save_pdb can be used instead).
Arguments:
• filename (string): Filename to be used for the output mol2 file.
• radius (float): Distance criterion which determines how much of the binding site will be
written out. Default is 7Å, i.e. all residues that have at least one atom within 7Å of at least one
atom in the ligand will be included.
• modified (integer): By default, the output file will include any changes (geometrical
transformations, changed or added attributes) that may have been made to the object by
Reliscript. To write out the original, unmodified object, set modified=0.
• include_pack (integer): By default, the output file will include the associated
PackBindingSite object. If and only if modified=1, the PackBindingSite data can be excluded
by setting include_pack=0.
transform(object, on_original=0)
Applies the same (rotation + translation) geometrical transformation to the BindingSite object as has
already been applied to the object passed in as an argument.
Arguments:
• object (Reliscript data object): A data object (e.g. a PDB, Chain, NucleicAcid, Ligand,
50
Reliscript User Guide
Solvent, or Residue object) that has been subjected to a geometrical transformation.
• on_original (integer): By default, the transformation will be applied to the BindingSite
object in its current orientation, which may already be the result of a previous transformation. To
transform the original, untransformed atomic positions, set the on_original flag to a nonzero value.
clear_transform()
Clears any geometrical transformation that has been applied to the BindingSite object so that all atom
coordinates return to their original values.
4.9.5 BindingSiteChain, BindingSiteNucleicAcid and BindingSiteSolvent Objects
These objects (see Section 4.9.3, page 49) have the same representation, attributes and functions as
their non-binding-site equivalents with the following exceptions:
• They will only return information on the atoms, bonds and residues that are in the binding site
(e.g the atoms attribute of a BindingSiteChain object will not include Atom objects that belong
to residues in the chain that lie outside the binding site).
• BindingSiteChain objects have no sequence or sequence_3d attributes.
• BindingSiteNucleicAcid objects have no sequence_3d attributes.
• The textual representations of BindingSiteChain, BindingSiteNucleicAcid and
BindingSiteSolvent objects are slightly different from those of Chain and Solvent objects, viz.
the name component is BindingSiteChain rather than Chain and BindingSiteSolvent rather than
Solvent.
4.10 PackBindingSite Objects
Like the BindingSite object (see Section 4.9, page 48), a PackBindingSite object contains information
on a ligand’s surroundings. The difference between the two is that a PackBindingSite holds data about
protein chains, nucleic acid, ligand and solvent molecules that are within range of a protein-bound
ligand because of crystallographic packing; in other words, atoms that are generated by
crystallographic symmetry.
There is exactly one PackBindingSite object for each Ligand object (and, therefore, for each
BindingSite object), i.e. each PackBindingSite is defined with respect to a particular Ligand object. A
PackBindingSite object is similar to a PDB object in that it has the attributes chains, solvent
and ligands. In the PDB object, these refer to the contents of the complete protein; in a
PackBindingSite object, they refer to:
• chains: All protein chain residues generated by crystallographic symmetry that have at least
one atom within 7Å of the ligand defining the PackBindingSite object.
Reliscript User Guide
51
• nucleic_acids: All protein nucleic acid residues generated by crystallographic symmetry
that have at least one atom within 7Å of the ligand defining the PackBindingSite object.
• solvent: All solvent atoms generated by crystallographic symmetry that are within 7Å of the
ligand defining the PackBindingSite object.
• ligands: All other ligands generated by crystallographic symmetry that have at least one atom
within 7Å of the ligand defining the PackBindingSite object.
A PackBindingSite object only contains atoms generated by crystallographic symmetry; for the
primary binding site of a ligand, use the BindingSite object (see Section 4.9, page 48).
4.10.1 Creation of PackBindingSite Objects
A PackBindingSite object can only be created from the associated Ligand object or the corresponding
BindingSite object, e.g.
pack_bs_obj1 = ligand_obj.pack_binding_site
pack_bs_obj2 = bindingsite_obj.pack_binding_site
A list of PackBindingSite objects is also available from the PDB object, e.g.
# Get first pack binding site in PDB object pdb_obj
pack_bs_obj3 = pdb_obj.pack_binding_sites[0]
4.10.2 Textual Representation of PackBindingSite Objects
Print operations, etc., on PackBindingSite objects (e.g. print pbsite_object) will produce
output such as:
PackBindingSite<pdb:1a01:APA_301-A>
Colons separate the contents of the angle brackets into components. The first component is the
database identifier (see Section 3.6, page 17), followed by the PDB code; the final component is the
internal Relibase+ identifier of the associated ligand (see Section 4.4.2, page 34). If the
PackBindingSite object is from an in-house database (i.e. not derived from the main PDB) the output
will be, e.g.:
PackBindingSite<dbid:1a01:APA_301-A>
where dbid is the identifier of the in-house database.
4.10.3 Attributes of PackBindingSite Objects
The attributes (see page 77) of a PackBindingSite object are:
52
Reliscript User Guide
Name
Type
Description
atoms
list of Atoms
The Atom objects in the PackBindingSite, not including
the atoms of the ligand used to define the PackBindingSite.
binding_site
BindingSite
The associated BindingSite object.
bonds
list of Bonds
The Bond objects in the PackBindingSite. This includes
only those bonds between atoms in the PackBindingSite,
i.e. does not include bonds from residues in the PackBindingSite to residues outside the PackBindingSite.
bound_ligand
Ligand
The ligand object used to define the PackBindingSite.
chains
list of Chains
The Chain objects in the PackBindingSite. Each Chain
object will contain only those residues that have at least
one atom within 7Å of the bound ligand.
nucleic_acids
list of NucleicAcids
The NucleicAcid objects in the PackBindingSite. Each
NucleicAcid object will contain only those residues that
have at least one atom within 7Å of the bound ligand.
ligands
list of Ligands
The Ligand objects that are contained in the PackBindingSite (these will not include the ligand used to define the
PackBindingSite).
pdb
PDB
The PDB object associated with this PackBindingSite.
solvent
list containing one
Solvent object
List containing one Solvent object. This single object contains the solvent atoms in the PackBindingSite.
4.10.4 Functions of PackBindingSite Objects
The functions (see page 80) of a PackBindingSite object are:
pdb_atoms()
Returns a list of string objects containing the ATOM records of the PackBindingSite object.
save_pdb(filename)
Saves the PackBindingSite object as a file in pdb format. If the PackBindingSite has been modified in
any way (e.g. subjected to a geometric transformation), the modified data will be written out, not the
original data as retrieved from the Relibase+ database.
Reliscript User Guide
53
Arguments:
• filename (string): Filename to be used for the output pdb file.
transform(object, on_original=0)
Applies the same (rotation + translation) geometrical transformation to the PackBindingSite object as
has already been applied to the object passed in as an argument.
Arguments:
• object (Reliscript data object): A data object (e.g. a PDB, Chain, NucleicAcid, Ligand,
Solvent, or Residue object) that has been subjected to a geometrical transformation.
• on_original (integer): By default, the transformation will be applied to the PackBindingSite
object in its current orientation, which may already be the result of a previous transformation. To
transform the original, untransformed atomic positions, set the on_original flag to a nonzero value.
clear_transform()
Clears any geometrical transformation that has been applied to the PackBindingSite object so that all
atom coordinates return to their original values.
4.10.5 Chain, NucleicAcid, Ligand and Solvent Objects derived from PackBindingSite Objects
Chain, NucleicAcid, Ligand and Solvent objects derived from PackBindingSite objects have some
exceptional features:
• They will only return information on the atoms, bonds and residues that are in the
PackBindingSite (e.g. the atoms attribute of a Chain object will not include Atom objects that
belong to residues in the chain that lie outside the PackBindingSite).
• Chain objects do not have sequence or sequence_3d attributes.
• NucleicAcid objects do not have sequence_3d attributes.
• Ligand does not have a save_mol2 function.
5
Storing and Manipulating Collections of Objects: Container Objects
Container objects are used to store and manipulate collections of data objects, allowing access to
individual members of the collection in a simple and consistent manner (see Section 3.3.2, page 14).
Python itself provides several types of container objects, several of which are used in Reliscript. In
addition, Reliscript has one customised container object, the set, which offers features for, e.g.,
interconverting one type of data object to another (e.g. a set of PDB objects to a set of Ligand
objects).
54
Reliscript User Guide
5.1 Using Standard Python Containers to Hold Data Objects
Reliscript data objects can be stored in standard Python container objects (see Section 3.3.2, page 14)
such as lists (see page 80) and dictionaries (see page 78), this is illustrated in the script
container_example2.py.
5.2 Set Objects
Set objects are similar to Relibase+ hitlists except that it is possible to maintain a given order in a set
(so, for example, it is possible to sort the members of a set into a particular order depending on the
value of a particular attribute). No Reliscript data object can appear more than once in a set.
5.2.1 Types of Sets
A set will have a specific type depending on the data objects it contains. The five possible types are
’pdb’, ’chain’, ’nucleic_acid’, ’ligand’ and ’solvent’, referring to sets which
contain, respectively, PDB, Chain, NucleicAcid, Ligand or Solvent objects (see Section 4, page 23).
Set types can be specified when a set is created (see Section 5.2.2, page 55). Although Chain and
Solvent objects can be contained in Reliscript sets (and these sets can be stored persistently in files),
they cannot currently be stored in Relibase+ hitlists.
5.2.2 Creating Sets
To create a set containing all objects of a given type (see Section 5.2.1, page 55) in the Relibase+
database(s), enter a command such as:
# Create set containing all PDB objects in database
pdb_set = reliscript.set(’pdb’)
To create an empty set of a specific type, enter a command such as:
empty_ligand_set = reliscript.set(’ligand’,[])
To construct a set based on a hitlist that has been created in a web-based Relibase+ session and stored
in a user’s Relibase+ workspace (see Section 3.7.3, page 18), enter commands such as:
reliscript.use_workspace(’myname’)
hitlist_ligand_set = reliscript.set(’ligand’, ’my_hitlistname’)
If the type specified for the set (e.g. ‘ligand’ above) is not the same as the type of the hitlist, an
automatic conversion will occur. For example, if my_hitlistname in the example above is a PDB
hitlist, the resulting Reliscript set will contain all the Ligand objects in the PDB entries contained in
Reliscript User Guide
55
the hitlist. By default, Reliscript will use the login username of the user for the workspace identifier.
In the above example, if no hitlist called my_hitlistname is found in the relevant workspace,
Reliscript will assume that the name refers to a file (hitlists may be saved to file as well as stored in
Relibase+ workspaces). In this case, if the name does not include a file extension, the extension .rbs
will be added.
The conversion of one type of set to another always leads to the creation of a new set (see Section
5.2.5, page 56).
5.2.3 Copying Sets
Sets have a copy function that produces a full copy of the set, e.g.
set1 = set2.copy()
5.2.4 Saving Sets
PDB and ligand sets (but not chain or solvent sets) can be saved as Relibase+ hitlists (which are then
accessible in the Relibase+ graphical user interface), e.g.
hitlist_chain_set.save_to_hitlist(’hitlistname’)
These may then be read into Relibase+ sessions, e.g. for 3D viewing.
Any type of set can be saved as a file, e.g.
ligand_set.save(’/home/user/myligset’)
The extension .rbs will be added if no file extension is specified, e.g. the above command will save
the set into the file myligset.rbs in the /home/user directory.
5.2.5 Converting One Type of Set to Another
Sets of all types (pdb, chain, nucleic_acid, ligand, solvent) can be interconverted. Conversion of one
type of set into another may change the number of data objects, e.g. since a given PDB object may
contain different numbers of chains and ligands.
Set type conversions can take place on request or automatically. For example, if the set pdb_set
contains PDB objects and we want to produce a set called lig_set containing all the ligands for the
entries in the pdb_set, we would use the command:
lig_set = reliscript.set(’ligand’, pdb_set)
56
Reliscript User Guide
Alternatively, if we have a set of PDB objects, pdb_set, and a set of Ligand objects, lig_set, and we
require a new set containing all PDB objects in pdb_set that do not have ligands in lig_set, we would
use:
pdb_set2 = pdb_set – lig_set
(see Section 5.2.6, page 57). When this command is executed, an automatic conversion of lig_set to a
temporary set containing the corresponding PDB objects will occur, so that the subtraction can then
take place between sets of the same type.
5.2.6 Logical Operations on Sets
It is possible to apply logical operators to sets. The & operator performs a logical AND of two sets,
i.e. the resulting set will contain only those objects that appear in both sets. For example:
pdb_set3 = pdb_set1 & pdb_set2
# pdb_set3 contains only those PDB objects that occur in both
# pdb_set1 and pdb_set2
The | operator performs a logical OR of two sets, i.e. the resulting set will contain all objects that
occur in either (or both) of the sets involved. This is equivalent to the use of the addition operator. For
example:
lig_set2 = lig_set1 | pdb_set1
# lig_set2 contains all the Ligand objects in lig_set1 and all the
# Ligand objects contained in the PDB objects in pdb_set1
The ^ operator performs a logical XOR (exclusive OR) of two sets, i.e. the resulting set will contain
only those objects that appear in one of the sets but not the other. For example:
chain_set3 = chain_set1 ^ chain_set2
# chain_set3 contains only those Chain objects in chain_set1 that
# are not in chain_set2, plus those in chain_set2 that are not in
# chain_set1
The other operators that are allowable between sets are the addition (+) and subtraction (-)
operators. Addition of sets produces identical results to the | (i.e. OR) operator. Subtraction of sets
produces a set containing only those objects in the first set that do not occur in the second set (i.e.
NOT operation).
The precedence of operators in Python (governing the order in which they are executed in an
expression without brackets) is: + ,- done before & done before ^ done before |. The safest rule is
Reliscript User Guide
57
to use brackets to ensure that operations will be done in the order you expect, e.g.
# Operator in brackets will be executed before operator
# outside brackets:
set4 = set1 + (set2 ^ set3)
In all operations, the first set has priority. This means that:
• The resulting set will always be of the same type as the first set, e.g.
new_set = ligand_set & chain_set
# new_set is a ligand set, not a chain set
• When the operation is such that the resulting set could contain objects from both of the original
sets, the order of objects in the resulting set will be: objects from first set followed by objects
from second set. This means that, if either of the original sets was ordered according to some
attribute, the sort would need to be reapplied to the new set to get the correct overall order.
5.2.7 Indexing, Accessing and Deleting Members of a Set
Set objects mimic Python lists for the purposes of accessing members of the set. The script
set_example.py illustrates the implemented list commands.
In addition, objects in PDB sets can be indexed by 4 letter PDB codes, e.g.
pdb_obj = pdb_set[‘1qs4’]
5.2.8 Sorting Members of a Set into a Particular Order
The initial order of entries in a set is alphabetical, based on the complete Relibase+ identifier for the
stored object. Examples of these identifiers are:
PDB1A0L (PDB Object)
100_1004_PDB1QS4_1 (Ligand)
PDB1A0L-A_1 (Chain)
SOLV_PDB1A0L_1 (Solvent)
If one type of set is converted to another type, the order of objects in the original set will be preserved.
Because of the different locations of the PDB code within the identifier, this may mean that the
resulting set is not ordered alphabetically even if the original set was.
To reverse the order of objects in a set, use a command such as:
58
Reliscript User Guide
pdb_set.reverse()
To sort a set on the object identifiers, use a command such as:
pdb_set.sort()
To sort on an attribute, pass the attribute name as a string to the sort function (the order is that for a
normal Python sort, i.e. lowest first), e.g.
# Sort the PDB objects by crystallographic resolution
pdb_set.sort(’resolution’)
For a more complex sort, it is possible to pass a function into the sort function, e.g.
pdb_set.sort(my_function)
my_function must take two arguments, each being an object of the type stored in the set (e.g. PDB
objects for a set of type ’pdb’) and return -1, 0 or 1 depending on whether the first argument is
considered smaller than, equal to, or larger than the second argument.
Note that sorting an entire set may be very time consuming. It is therefore recommended that sorting
is performed as a last step after any filtering has taken place. For example, if we wanted to sort
urokinase entries by year, it would be inefficient to first sort the whole pdb set by year and then search
for urokinase entries. The preferred style would filter out the urokinase entries first and then sort by
year:
# Create the set
pdb_set = rs.set('pdb')
# Filter
search = rs.text_search(field='title',searchstring='UROKINASE')
search(pdb_set)
# Sort after filtering
pdb_set.sort('year')
Reliscript User Guide
59
5.2.9 Appending Objects to a Set
It is possible to append objects to a set using the append function; the extend function can be
used in exactly the same way. The item to be appended can be defined as an object or an identifier
string; other sets, lists and tuples of objects can also be appended. For example, the following code
uses identifier strings to create a set containing 3 PDB objects:
# Initialise a list of 3 PDB identifiers
my_list = [’1ab2’,’2b03’,’3c04’]
# Create an empty set
new_pdb_set = reliscript.set(’pdb’,[])
# Append each PDB entry to set
new_pdb_set.append(my_list)
The last line could also have be written using a for loop, appending the individual pdb identifiers to
the new_pdb_set one at a time, see script set_example2.py.
The script set_example3.py shows the use of the append command to divide the objects in one
set into two new sets based on some criterion, in this case the date:
Before an object is appended to a set, it is tested to see if it is of the same type as the objects already
in the set. If not (e.g. if we try to append a PDB object to a set containing Ligand objects), the object
to be appended is converted to the correct type (in the example just given, this would mean generating
all the Ligand objects contained in the PDB object and adding them to the set).
5.2.10 Summary of Set Functions
Sets have the following functions:
•
•
•
•
•
•
copy (see Section 5.2.3, page 56)
save (see Section 5.2.4, page 56)
save_to_hitlist (see Section 5.2.4, page 56)
sort (see Section 5.2.8, page 58)
reverse (see Section 5.2.8, page 58)
append or extend (see Section 5.2.9, page 60)
In addition, sets can be:
• Interconverted (one type of set to another) (see Section 5.2.5, page 56)
• Subjected to logical operations (see Section 5.2.6, page 57)
• Indexed (for accessing items within the set) in various ways (see Section 5.2.7, page 58)
60
Reliscript User Guide
and there are options for deleting items from a set and taking “slices” (i.e. subsets) of a set (see
Section 5.2.7, page 58).
6
Doing Searches and Other Calculations: Operation Objects
Operation objects (see Section 3.3.3, page 15) are available for performing the above tasks.
In addition, customised operation objects may be written for doing searches and other calculations not
in the above list (see Section 8, page 73).
6.1 Text and Keyword Searching
The text search class can be used to filter a set of data objects so that only those objects containing a
specified textual search term will be kept. The class can handle both simple text searches and more
complex searches involving regular expressions. It is also possible to limit the search to particular text
fields, e.g. the author field.
6.1.1 Creating a Text Search Object; Initialization Parameters
An operation object for performing a text search can be created with a command of the form:
text_search_object = reliscript.text_search(parameters)
The first parameter must specify what is to be searched for. This can be either a string (see Section
6.1.2, page 62) or a compiled regular expression object (see Section 6.1.3, page 62).
Other, optional, parameters are:
case
• When a text string is used the search will, by default, ignore the case of the string while
searching. By passing the option case=’match’ the search is forced to match the case of the
search string. This option will be ignored if the search is for a regular expression.
component or components (either spelling can be used)
• This allows the searching of text strings within particular components of the data object. For
example, if we have a ligand set but wish to do a search on text strings in the PDB object
associated with the ligand, we would use the option component=’pdb’. Conversely, if we
have a PDB set, but wish to search the text strings of the chains, nucleic_acids, ligands or
solvent molecules in the PDB entries, we would use component=’chains’,
component=’nucleic_acids’, component=’ligands’ or
Reliscript User Guide
61
component=’solvent’, respectively. An option such as
components=[’chains’,’ligands’] with a PDB set would search the text-string
attributes of both the chain and ligand objects associated with each PDB object.
field or fields or attribute or attributes (any spelling can be used)
• By default, the search will be over all the text-string attributes within the object or its nominated
component(s). The precision and speed of the search can be improved by specifying which
object attribute(s) are to be searched. Examples are: field=’authors’ and
attributes=[’header’,’method’].
type
• If a text string is used, then, by default, the search will simply look for this string. Passing the
argument type=’re’ will treat the passed string as a regular expression definition string and
will create an internal regular expression object. This can be more convenient than setting up a
regular expression object yourself before creating the text search object, but is not as flexible.
6.1.2 Example Text Search
Please refer to example_text_search.py to view the example script.
6.1.3 Example Regular Expression Text Search
Please refer to example_regular_expression_search.py to view the example script.
6.2 Numeric Searching
The numeric search class can be used to filter a set of data objects on the numerical value of a
particular attribute, e.g. resolution, mol_wt, etc.
6.2.1 Creating a Numeric Search Object; Initialization Parameters
An operation object for performing a numeric search can be created with a command of the form:
numeric_search_object = reliscript.numeric_search(parameters)
The first parameter must specify the attribute whose numeric value is to be tested, e.g. the
resolution of a PDB object. Other, optional, parameters are:
min
• Specifies the minimum acceptable value of the attribute.
max
62
Reliscript User Guide
• Specifies the maximum acceptable value of the attribute.
component
• This allows the searching of attributes within particular components of the data object. For
example, if we have a ligand set but wish to do a numeric search on the resolution of the parent
PDB objects, we would use the option component=’pdb’.
6.2.2 Example Numeric Search
Please go to example_numeric_search.py to view the example script.
6.3 Sequence Searching
The sequence search class can be used to filter a set of Chain objects so that only those objects are
kept whose percentage sequence identity to a user-specified Chain object (or a sequence defined by
one-letter amino-acid codes) falls within a given range. By default, the percentage identity is set to
100, i.e. the search will find exact sequence matches only.
The search object reorders the set so that the most similar chains are at the front. The sequence search
class has an additional use beyond simple chain similarity as it is used as the basis for determining
similar binding sites (see Section 6.7.3, page 72).
6.3.1 Creating a Sequence Search Object; Initialization Parameters
An operation object for performing a FASTA (http://fasta.bioch.virginia.edu/) sequence search can be
created with a command of the form:
seq_search_object = reliscript.sequence_search(parameters)
The first parameter must be the sequence to be searched for, either as a string of one-letter codes or as
a Chain object (see Section 6.3.3, page 64).
Other, optional, parameters are:
minidentity and maxidentity
• These should be floating point numbers in the range 0.0 to 100.0, with maxidentity greater
than or equal to minidentity. Their purpose is to define how closely the specified search
sequence must be matched in order for a data object to be considered a hit. The search will
calculate an identity value for each data-object sequence compared with the requested search
sequence. Only those objects for which the identity value is greater than or equal to
minidentity and smaller than or equal to maxidentity will be regarded as hits. The
default values are minidentify = 100.0, maxidentity = 100.0, i.e. only exact
matches.
Reliscript User Guide
63
attribute_name
• By default, an attribute called sequence_similarity is added to each hit (i.e. data object
found in a sequence search) (see Section 6.3.2, page 64). It contains information about how
similar the hit is to the search sequence. A different name can be specified for this attribute, e.g.
by using attribute_name = seqsim as an option when creating the sequence search
object. This would be useful if two or more sequence searches were performed on the same set
of data objects. By using different attribute names for each search, it would be possible to
distinguish the results of the searches later.
align_identity
• This retrieves the sequence similarity of two chains from different proteins. It is an "on the fly"
ALIGN calculation which does a one on one sequence alignment.
6.3.2 Attributes Created by Sequence Search Objects
The following attribute will be added to each data object passing a sequence search:
sequence_similarity
• Dictionary object containing two values whose keys are homology and score. These relate to
the FASTA-calculated homology value and score, respectively.
• The default name of this attribute, sequence_similarity, can be over-ridden (see
Section 6.3.1, page 63).
• An example showing how the homology value can be accessed is:
# Examine the similarity value for the third-closest hit
print chain_set[2].sequence_similarity[’homology’]
90.0
6.3.3 Example Sequence Search
Please go to example_sequence_search.py to view the example script.
6.4 Consensus Motif Searching
The consensus motif search class can be used to filter a set of data objects so that only those data
objects matching a specified consensus motif (an amino acid sequence containing one or more
variable residues) will be kept. It is different from the sequence search object (see Section 6.3, page
63) in two ways:
• The sequence specified can include the character X, which will match any residue.
64
Reliscript User Guide
• The search will only find exact matches (apart from the variability implied by the use of the
symbol X), i.e. there is no option to specify a homology range, as there is in the normal sequence
search.
6.4.1 Creating a Consensus Motif Search Object; Initialization Parameters
An operation object for performing a consensus motif search can be created with a command of the
form:
con_motif_object = reliscript.consensus_search(parameters)
The first parameter must be the sequence to be searched for, either as a string of one-letter codes, in
which X can be used to mean any amino acid (see Section 6.4.3, page 66), or as a Python regular
expression string or search object (see Section 6.4.4, page 66).
Other, optional, parameters are:
attribute_name
• By default, an attribute called consensus_search is added to each data object found in a
consensus motif search. It contains additional information about the results of the search (see
Section 6.4.2, page 65). A different name can be specified for this attribute, e.g. by using
attribute_name = consim as an option when creating the consensus motif search object.
This would be useful if two or more consensus motif searches were performed on the same set of
data objects. By using different attribute names for each search, it would be possible to
distinguish the results of the searches later.
6.4.2 Attributes Created by Consensus Motif Search Objects
The following attribute will be added to each data object passing a sequence search:
consensus_search
• Dictionary object containing one item (a tuple of tuples) whose key is locations. Each tuple
will contain two values, the first being the starting position in the sequence of the match (the first
residue in a chain will be number 0, not 1!!), the second being the length of the match, i.e. the
number of residues in the matched sequence. The latter is relevant if a regular expression has
been used which may produce sequence matches involving variable numbers of residues (see
Section 6.4.4, page 66).
• The default name of this attribute, consensus_search, can be over-ridden (see Section
6.4.1, page 65).
• An example showing how the location data can be accessed is:
# Examine the matching location(s) for the third closest hit
Reliscript User Guide
65
print chain_set[2].consensus_search[’locations’]
((34,6),)
6.4.3 Example Consensus Motif Search
Please go to example_consensus_motif_search.py to view the example code.
6.4.4 Example Consensus Motif Search Using a Regular Expression
A search for a protein sequence beginning with I, then having 3, 4 or 5 G or V residues, followed by
one Q, followed by 2 or 3 residues of any type, and ending with the sequence PRS can be achieved
using python’s regular expression module.
Please go to example_consensus_motif_search_using_regular_expression.py
to view the example code.
6.5 SMILES and SMARTS Searching
The SMILES search class can be used to filter a set of data objects so that only those objects
containing a particular substructure, as defined by a SMILES string (http://www.daylight.com/smiles/
index.html), will be kept.
The following information is helpful if you use SMILES in Reliscript:
• Information about charges, isotopes and stereochemistry is ignored.
• Hydrogens are only allowed in brackets together with a heavy atom, e.g. [NH3] or [OH].
• Hydrogens can be used to fill up valencies, e.g. C(=O)[NH2] will find only carbamoyl groups,
and not, e.g., peptide linkages.
• Reliscript supports the bond-type any (use the one-character symbol ’~’).
• Reliscript supports three types of atom “wildcards”, viz:
• *: any atom
• A: any aliphatic atom
• a: any aromatic atom
• Aromatic bonds are only supported for 6-membered aromatic rings; use single and double bonds
for other unsaturated rings
• Reliscript does not support tautomeric states; use bonds of type any (SMILES code ~)
• Queries using ’.’ are not supported
SMARTS Searching
The SMARTS search class is analogous to SMILES except SMARTS are used to represent
66
Reliscript User Guide
substructures rather than entire molecules (http://www.daylight.com/dayhtml_tutorials/languages/
smarts/index.html).
The implementation of SMARTS in Relibase+ is not comprehensive; limitations are primarily due to
the way in which ligands are stored in Relibase+. The following should be taken into consideration
when using SMARTS:
• Relibase+ assumes bond types given in the SMARTS query match Relibase+ conventions. In
particular:
• Six-membered aromatic rings have aromatic bond types
• Five-membered rings are non-aromatic unless pi bonded to a metal (e.g. ferrocenes).
• Due to the nature of the data source, hydrogen counts on atoms other than carbon are not
reliable, use of Dn atom constraint (number of non-hydrogen connections) is recommended
rather than Xn (total number of connections) for heteroatoms.
Unsupported features (general):
• Dot disconnected fragments, e.g. (C).(C)
• Recursive SMARTS, e.g. [$(CC);$(CCC)]
• Reaction SMARTS, e.g. [CC>>CC].
Unsupported features (atom properties):
• Some atom constraints (where n is an integer):
• v<n>: valency constraint.
• x<n>: number of ring connections constraint.
• h<n>: implicit hydrogen constraint (no distinction is made between implicit and explicit H in
Relibase+).
• Charge constraints (no charges are stored in Relibase+).
• R<n> where n>=1 (no smallest set of smallest rings implementation).
• #<n>: atomic number (the element symbol should be used).
• <n>: atomic mass.
• Stereochemical descriptors.
• Constraints of different types combined with OR operator, e.g. [X1, D2].
• High precedence AND in OR subexpression, e.g. [C, N&H1] (constraints can only be applied
to all element types in an atom).
Unsupported features (bond properties):
• Stereochemical descriptors for double bonds: these are treated as single bonds with unspecified
stereochemistry.
Reliscript User Guide
67
• High-precedence AND in OR subexpression, e.g. =&@,- (cyclic double or single and
unspecified cyclicity).
• The following constructs are not supported:
• NOT any bond, e.g. !~.
• different bond types combined with AND operator, e.g. -&= (single and double).
• different NOT bond types combined with OR operator, e.g. !-,!= (not single or not double,
equivalent to any bond).
6.5.1 Creating a SMILES or SMARTS Search Object; Initialization Parameters
An operation object for performing a SMILES search can be created with a command of the form:
smiles_object = reliscript.smiles_search(parameters)
A similar operation object can be created for SMARTS:
smarts_object = reliscript.smarts_search(parameters)
The first parameter must be the SMILES or SMARTS definition of the substructure to be searched for
(see Section 6.5.3, page 69).
Other, optional, parameters are:
attribute_name
• By default, an attribute called smiles_hit_data/smarts_hit_data (for SMILES and
SMARTS searches respectively) is added to each data object found in a substructure search. It
contains information about which atoms in the hit object matched the atoms of the SMILES
string (see Section 6.5.2, page 69). A different name can be specified for this attribute, e.g. by
using attribute_name = my_matched_atoms as an option when creating the SMILES
search object. This would be useful if two or more SMILES searches were performed on the
same set of data objects. By using different attribute names for each search, it would be possible
to distinguish the results of the searches later.
store_match
• The default value for this parameter is 1, which means that the smiles_hit_data attribute
will be created for each hit. Set store_match = 0 if you do not want to create this attribute
(i.e. you do not need the atom-matching information).
exact_match
• This is an additional optional parameter for SMILES search objects only. By default, this
68
Reliscript User Guide
parameter is set to 1, meaning that the SMILES string ’c1ccccc1’ would match only
benzene, not ligands containing a benzene substructure. Note that by setting this parameter to 0,
the SMILES search object will perform exactly the same search as the equivalent SMARTS
search object.
all_models
• If used, this command should be added after the above arguments. This command controls
whether all ligand models are included for NMR structures which have multiple structural
models. The default all_models=0 includes ligands only from the first NMR model. Setting
all_models=1 causes all ligand models to be stored.
See the smiles_smarts_search_example.py for a sample script illustrating ligand searches
using SMILES and SMARTS.
6.5.2 Attributes Created by SMILES/SMARTS Search Objects
The following attribute will be added to each data object passing a SMILES/SMARTS search:
smiles_hit_data/smarts_hit_data
• List of lists containing the atom objects that match the SMILES/SMARTS string. The outer list
will contain more than one item if the substructure specified by the SMILES/SMARTS string
occurs more than once in the hit object. The inner list contains the actual atoms in a matching
fragment.
• For example, suppose a search for the SMILES string ’C(=O)N’ (i.e. a peptide group) has been
performed on the ligand set lig_set. After the search,
ligset[0].smiles_hit_data[0][0] contains the carbon atom of the first peptide
group in ligset[0]. Similarly, ligset[0].smiles_hit_data[0][1] and
ligset[0].smiles_hit_data[0][2] contain the oxygen and nitrogen atoms,
respectively. If ligset[0] contains more than one peptide group, then
ligset[0].smiles_hit_data[1][0] to
ligset[0].smiles_hit_data[1][2] will contain the C, O and N atoms of the second
peptide group; and so on.
• The default name of this attribute, smiles_hit_data/smarts_hit_data, can be overridden (see Section 6.5.1, page 68).
• If desired, you can request that this attribute is not created (e.g. if you do not need the matching
information and want to save memory).
6.5.3 Example SMILES Search
Please refer to example_smiles_search.py to view the example script.
Reliscript User Guide
69
6.6 Similar Ligand Searching
The similar ligand search class can be used to filter a set of Ligand objects so that only those objects
are kept whose structural similarity to a user-specified ligand falls within a given similaritycoefficient range. Similarity is judged using a Tanimoto similarity coefficient calculated from 2D
fingerprints. The search object also reorders the set so that the most similar ligands are at the front.
Only the 1000 most similar results to the query ligand are returned.
6.6.1 Creating a Similar Ligand Search Object; Initialization Parameters
An operation object for performing a similar ligand search can be created with a command of the
form:
sim_lig_search = reliscript.similar_ligand_search(parameters)
The first parameter must specify the Ligand object which is to be used as the basis for the similarity
calculations.
Other, optional, parameters are:
mintani and maxtani
• These should be floating point numbers in the range 0.0 to 1.0, with mintani less than or equal
to maxtani. The search will calculate a Tanimoto similarity coefficient for each ligand
compared with the search ligand. Only those ligands whose Tanimoto coefficient is greater than
or equal to mintani and less than or equal to maxtani will be accepted as hits. The default
values are mintani = 0.4 and maxtani = 1.0.
attribute_name
• By default, an attribute called ligand_similarity, containing similarity-coefficient
information, will be added to each Ligand object found in a similar ligand search (see Section
6.6.2, page 70). A different name can be specified for this attribute, e.g. by using
attribute_name = ligsim as an option when creating the similar ligand search object.
This would be useful if two or more similar ligand searches were performed on the same set of
data objects. By using different attribute names for each search, it would be possible to
distinguish the results of the searches later.
6.6.2 Attributes Created by Similar Ligand Search Objects
The following attribute will be added to each Ligand object passing a similar ligand search:
ligand_similarity
70
Reliscript User Guide
• Dictionary object containing one item whose key is value. This is the calculated similarity
coefficient of the Ligand object with the search ligand.
• The default name of this attribute, ligand_similarity, can be over-ridden (see Section
6.6.1, page 70).
• An example showing how the similarity value can be accessed is:
# Print similarity value for second hit
print lig_set[1].ligand_similarity[’value’]
0.95
6.6.3 Example Similar Ligand Search
Please go to example_similar_ligand_search.py to view the example code.
6.7 Superimposing Chains and Similar Binding Sites
The chain superimposition class can be used to superimpose each member of a set of Chain objects
onto a reference chain. The normal mode of use will involve performing a sequence similarity search
first (see Section 6.3, page 63), to sequence-align each member of the chain set with the reference
chain. Once this is done, the chain superimposition object can be used to perform the
superimpositions by least-squares overlaying the alpha-carbon atoms of some of the matched
residues. A particular use of chain superimposition is to overlay similar binding sites.
6.7.1 Creating a Chain Superimposition Object; Initialization Parameters
An operation object for performing a chain superimposition can be created with a command of the
form:
chain_superpose = reliscript.superimpose_chain(parameters)
The first parameter must specify the Chain object on which the other chains will be superimposed.
Other, optional, parameters are:
ligand
• The atoms used for superimposition will always be restricted to alpha-carbons in matched
residues (i.e. residues that have been successfully matched in a sequence alignment of the
reference chain and the chain to be superimposed). The least-squares superposition can be
further restricted to alpha-carbon atoms in matched residues in the binding site of a ligand bound
to the reference chain. To do this, include a parameter such as ligand = ref_chain_lig
(see Section 6.7.2, page 72).
Reliscript User Guide
71
6.7.2 Example Chain Superimposition
This example assumes that a sequence similarity search has already been done (see Section 6.3, page
63). Please refer to example_chain_superimposition.py for the chain superimposition
example script.
6.7.3 Example Similar Binding Site Search
A similar binding site search involves:
• Specifying the ligand of interest and getting the chain to which it is bound (the reference chain).
• Using a sequence similarity search (see Section 6.3, page 63) to find all chains similar in
sequence to the reference chain.
• Using a chain superimposition object (see Section 6.7, page 71) to overlay the similar chains
onto the reference chain.
• Applying a distance test to find all ligands bound to the various superimposed chains that are
close to the original ligand of interest.
• Writing this information out.
Please refer to example_similar_binding_site_search.py to view the example script.
7
Global Utility Functions
In addition to the reliscript.create command, used for creating data objects (see Section 4,
page 23), the reliscript.set command for creating sets (see Section 5.2.2, page 55), and
commands such as reliscript.text_search, reliscript.sequence_search, etc.,
for constructing operation objects (see Section 6, page 61), Reliscript provides a selection of global
utility functions, as follows:
7.1 nice level
The nice level of reliscripts is set to 5 by default so as not to make reliscripts run with a higher priority
than the Relibase server. Note that the nice level of your reliscripts can easily be modified using the
built in os module. For scripts that are anticipated to run for a long time it is recommended that the
nice level is set to 10. This is achieved by inserting the code below at the beginning of the script:
import os
os.nice(10)
7.2 distance and max_distance
Each of these functions takes a pair of objects as arguments, e.g.
72
Reliscript User Guide
reliscript.distance(object1, object2)
Each object can be any of the following:
• An Atom object or any other object that has a coords attribute.
• Any data object that will provide a list of Atom objects (i.e. all Reliscript data objects).
distance calculates and returns the distance (in Å) between the two closest atoms in the two
objects (one from object1, one from object2). Conversely, max_distance calculates and
returns the maximum distance between two atoms, one from each of the two objects.
7.3 hitlists
A command such as:
reliscript.hitlists('a_user_name')
will return a list of dictionaries containing information on all the Relibase+ hitlists that have been
saved for the given username. If the username is omitted, the current username will be used. Each
hitlist dictionary will look like, e.g.
{{'name': 'test',
'user': 'a_user_name',
'time': u'2008-12-11 11:39:11.58',
'type': 'pdb',
'size': 10}}
7.4 use_workspace, set_workspace and set_username
Sets the workspace to be used for saving and reading of hitlists (see Section 3.7.3, page 18), e.g.
reliscript.use_workspace(’my_name’)
By default, Reliscript will use the login username of the user for the workspace identifier.
The global utility functions set_workspace, set_username do exactly the same thing as
use_workspace.
8
Extending the Functionality of Reliscript
It is obviously possible to extend the functionality of Reliscript by building a library of your own
Python and Reliscript functions, and by using the many Python modules available on the Internet (see
Reliscript User Guide
73
http://www.python.org). In addition, you can write customised operation classes for performing
searches and other calculations not provided by default. To do this, it is necessary for the customised
operation class to inherit from a base operation class. For more details, see
example_customised_operation_class.py.
8.1 Base Operation Class
The base operation class, reliscript.base_operation_class, provides much of the code
required for user-defined operation classes. The user’s operation class must inherit from
base_operation_class and provide working versions of a small number of class functions (see
Section 8.2, page 74).
8.2 Functions Required in a Customised Operation Class
Some or all of the following functions of the base operation class (see Section 8.1, page 74) will need
to be over-ridden to produce a useful, customised operation class. Default implementations for all
functions are provided in the base class, and a particular function need not be over-ridden if the
default action is acceptable. The function declarations below are given as they must appear in the
class definition and, as such, self must be the first parameter:
filter(self, object)
• Performs a test on object (e.g. for the presence of a text string). Returns 1 if the test was
successful, otherwise returns zero; default return value is 1. This function will normally be overridden to create a customised operation class that applies a useful filter.
filter_object_type(self)
• Returns the type of object that the filter function will require. Must return one of ’pdb’,
’chain’, ’ligand’ or ’solvent’.
use_filter(self)
• Returns 1 if the filter function is to be used, otherwise returns zero. By default, this function
returns 1.
manipulate(self, object)
• Performs some sort of manipulation on the object passed in; this could be, for example, the
addition of a new attribute to the object. By default, this function leaves the object unchanged.
Unlike the filter function, the type of the set being processed, e.g. ’pdb’, ’ligand’,
etc., must match the object type that the manipulate function expects.
manipulate_object_type(self)
• Returns the type of object that the manipulate function will require. Must return one of
’pdb’, ’chain’, ’ligand’ or ’solvent’.
74
Reliscript User Guide
use_manipulate(self)
• Returns 1 if the manipulate function is to be called, otherwise returns zero. By default, this
function returns 1.
8.3 Example of a Customised Operation Class
The following is an example script showing how the base operation class could be extended (see
Section 8.2, page 74) to create a customised operation object that will:
• Filter a set by a performing a search on all ligands in a PDB object for a given text string in the
ligand compound name.
• For each PDB object that passes this test, store the total number of residues in the protein chains
in the PDB object as an object attribute.
Please refer to example_customised_operation_class.py for the example script.
9
Example Scripts
Some simple examples are included in previous sections of this manual:
6.1.2 Example Text Search (see page 62)
6.1.3 Example Regular Expression Text Search (see page 62)
6.2.2 Example Numeric Search (see page 63)
6.3.3 Example Sequence Search (see page 64)
6.4.3 Example Consensus Motif Search (see page 66)
6.4.4 Example Consensus Motif Search Using a Regular Expression (see page 66)
6.5.3 Example SMILES Search (see page 69)
6.6.3 Example Similar Ligand Search (see page 71)
6.7.2 Example Chain Superimposition (see page 72)
6.7.3 Example Similar Binding Site Search (see page 72)
8.3 Example of a Customised Operation Class (see page 75)
In addition, more extensive and scientifically interesting examples are:
9.1
9.2
Finding and Classifying Contacts to Ligand Carboxylates (see page 76)
Analysing Ligand Contacts to Atoms in the Crystal-Field Environment (see page 76)
These scripts are available as separate .py files so that you can try them out.
Reliscript User Guide
75
9.1 Finding and Classifying Contacts to Ligand Carboxylates
Please refer to example_contacts_to_carboxylates.py to view the example script.
9.2 Analysing Ligand Contacts to Atoms in the Crystal-Field Environment
Please refer to example_packing_environment_contacts.py to view the example script.
10 Acknowledgements
Reliscript was conceived by Ingo Dramburg (Institute of Pharmaceutical Chemistry, Philipps
University of Marburg, Germany) who also contributed significantly to its design and coding.
The Java Programming Language is provided by Sun Microsystems, Inc. under the Binary Code
License Agreement.
The Python Programming Language is provided by the Python Software Foundation under the
Python Licence, Version 2.5.
The Java Python integration is provided by JPype under the Apache Licence V2.0.
76
Reliscript User Guide
11 Appendix A: Glossary
This is mainly (but not exclusively) a guide to Python terms.
attributes (see page 77)
comment (see page 77)
dictionaries (see page 78)
exception (see page 79)
flow control (see page 79)
for (see page 79)
functions (see page 80)
global functions (see page 80)
if (see page 80)
indentation (see page 80)
lists (see page 80)
modules (see page 82)
representation (see page 82)
Sybyl atom type (see page 83)
tuples (see page 83)
types (see page 84)
while (see page 84)
attributes
The attributes of an object are those items that can be retrieved by simply placing the attribute name
after the name of the object. For example, a command such as:
res = pdb_object.resolution
would result in res being a floating point number containing the resolution in Å of the PDB structure
contained in pdb_object.
comment
In Python, anything on a line after a hash mark (#) is a comment. Additionally, comments can be
attached to functions by using a triple-quoted string block below the function definition. These are
useful as they can be picked up by some automatic documentation systems (such as Pydoc, which
comes with Python) and some Python shell applications (such a PyCrust: http://sourceforge.net/
projects/pycrust/), where the comment will appear as a help pop-up.
Reliscript User Guide
77
dictionaries
Python dictionaries (or associative arrays) are container (i.e. storage) objects, i.e. a dictionary object
contains within it a collection of other objects. The objects stored in a dictionary are referenced by a
key. Python dictionaries can be initialised by specifying the key, value pairs enclosed in {} brackets.
For example:
# Dictionary: provides lookup
fruit_colours = {'apple':'red','banana':'yellow'}
# The next line returns 'yellow'
fruit_colours['banana']
A large number of standard operations can be done on dictionaries:
DICT[KEY]
Returns item associated with KEY or raises an
exception.
DICT[KEY] = ITEM
Stores ITEM in dictionary with key KEY.
del DICT[KEY]
Remove item referenced by KEY from dictionary.
len(DICT)
Number of items stored with a KEY.
DICT.has_key(KEY)
Returns true if DICT has an item indexed with
KEY.
DICT.keys()
Returns a list of all dictionary keys.
DICT.values()
Returns a list of all items stored in
dictionary.
DICT.items()
Returns a list of tuples, each a key, item
pair.
DICT.clear()
Remove all items from DICT.
DICT.copy()
Returns a copy of DICT. This is a shallow
copy: see main Python documentation for more
details.
DICT.update(DICT2)
Merge DICT2 into DICT. If identical keys,
DICT2 takes precedence.
DICT.get(KEY,
[,default])
Like DICT[KEY] but will return default if KEY
not present
78
Reliscript User Guide
DICT.setdefault(KEY Like .get, but sets default for later
, [,default])
requests.
DICT.popitem()
Removes and returns an arbitrary (key, item)
pair.
exception
Python can handle unexpected events, i.e. exceptions, which occur during the runtime of a Python
program.
Example:
a = 1
b = 0
print a/b
would raise a ’ZeroDivisionError’ and stop the program.
The script zero_division_example.py illustrates how this exception could be handled, thus
preventing the script from terminating.
flow control
Python uses the fairly standard flow-control options of:
• for (see page 79);
• while (see page 84);
• if (see page 80).
for
This command loops through the contents of a list, tuple or any other object that supports index
functionality and applies the code below it. For an example see the script
for_loop_example.py.
Note that the beginning and end of a for loop is determined by the indentation.
There are two special commands used in for loops, viz. continue, which finishes the current
loop and starts the next, and break, which exits the for loop completely.
Reliscript User Guide
79
functions
Calling a function of an object:
object.myfunc(arguments)
executes some code associated with that object. For example, a command such as:
atom_list = pdb_object.pdb_atoms(include_pack=0)
would execute code that sets up a list of string objects, atom_list, containing the ATOM records
of the PDB entry stored in pdb_object. The argument in this example instructs the function to
exclude pack atoms, i.e. atoms generated by crystallographic symmetry.
global functions
A global function is one that is not associated with a particular object, e.g.
d = reliscript.distance(atom1, atom2)
if
Like the similar command while (see page 84), the if command executes a test statement but the
associated command will only be run once. In addition there is the elif command, which stands for
else if, and the else command. An example is the script if_example.py.
indentation
Indentation of commands is (and must be) used to indicate code that lies within loops and conditional
statements. It is good practice to never use tabs in python scripts. Furthermore it is recommended that
each indentation is 4 white spaces long. For some sample code see the script
indentation_example.py.
lists
Python lists are container (i.e. storage) objects, i.e. a list object contains within it a collection of other
objects. Lists can be initialised by specifying the required items enclosed in [] brackets, e.g.
[0,1,2,3] initialises a list of four integers. The contents of a list can be of any type, including
other lists, e.g. [1,'two',[3.0,'four']] (this being a list containing an integer, a string, and
another list). Accessing lists is done by treating them as arrays; for example, if the previous list was
called mylist, then mylist[1] would return two and mylist[2][0] would return 3.0. The
first item in a list has index number 0, not 1.
80
Reliscript User Guide
Unlike tuples (see page 83), lists can be changed, e.g.
# List: can be changed
colours = ['red','yellow']
# The next line adds ‘blue’ to the colours list
colours.append('blue')
A large number of standard operations can be done on lists:
ITEM in LIST
Logical operation that returns true if ITEM is in
LIST
ITEM not in LIST
Logical operation that returns true if ITEM is not
in LIST
for ITEM in LIST:
LIST1 + LIST2
Loops round LIST using each ITEM in turn lists can
be added, e.g. [1,2] + [3,4] = [1,2,3,4]
NUMBER * LIST
Repetition, e.g. [1,2] * 2 = [1,2,1,2]. LIST *
NUMBER has the same effect.
LIST[INDEX]
Returns the INDEX item in list. Raises an
exception if out of range. Also, INDEX can be
negative, in which case it starts from the end,
e.g. if L = [1,2,3], then L[-1] = 3
LIST[START:END]
Returns a slice of list, e.g.if L = [1,2,3,4,5]
L[1:-1] = [2,3,4] (note:goes up to, but does not
include L[-1])
len(LIST)
Length of list (i.e. number of objects it
contains), e.g. len([0,1,2,3]) = 4
min(LIST)
Returns minimum value in list; mainly useful when
all numeric or text items.
max(LIST)
Returns maximum value in list.
LIST[INDEX] = ITEM
Removes the current contents of LIST[INDEX] and
replaces it with the object ITEM
LIST[START:END] =
LIST2
Removes list items between START and END
(including START but not END) and replaces them
with LIST2
Reliscript User Guide
81
del LIST[INDEX]
Removes item at index INDEX; resulting list will
be shorter by one
del LIST[START:END]
Deletes section of list (including START but not
END), e.g.if L = [1,2,3,4,5] and del L[1:-1] is
executed, then L = [1,5]
LIST.append(ITEM)
Adds ITEM to LIST
LIST.sort([FUNCTION
])
Sorts list; FUNCTION is optional and can be used
to specify a sort function other than, e.g.,
normal numerical order
LIST.reverse()
Reverses order of list, e.g. if L = [1,2,3] and
L.reverse() is executed, then L = [3,2,1]
LIST.index(ITEM)
Returns index of first instance of ITEM in LIST;
exception if not found.
LIST.count(ITEM)
Returns number of times ITEM appears in the list
LIST.insert(INDEX,
ITEM)
Inserts ITEM at point INDEX, e.g. if L = [1,2,3]
and we execute L.insert(2,4), then L = [1,2,4,3]
LIST.remove(ITEM)
Removes first instance of ITEM in LIST; raises
exception if not present
LIST.pop()
Returns and removes last item in LIST
LIST.extend(LIST2)
Adds LIST2 to end of LIST
modules
Modules are the larger-scale building blocks of Python. Reliscript is a Python module which you load
into a Python session using the command import reliscript. Python itself provides a wealth
of modules and a large number are available elsewhere for specific tasks, e.g. mathematics, statistics,
plotting, and many more (see http://www.vex.net/parnassus/).
In all cases, the name used to import a module defines the “namespace” for the components of that
module. For example, when Reliscript is loaded using import reliscript, then expressions
such as reliscript.set(‘pdb’) or reliscript.distance(at1, at2) are used to
access Reliscript functions.
representation
The representation of an object refers to how it is represented when a command such as:
82
Reliscript User Guide
print my_object
is executed.
Sybyl atom type
Reliscript will return the Sybyl atom type of an atom, as defined in the Sybyl program of Tripos Inc.,
St Louis, USA (http://www.tripos.com/). The most important of these types are:
C.3
sp3 carbon
N.pl3
trigonal planar nitrogen
C.2
sp2 carbon
N.4
sp3 cationic nitrogen
C.1
sp carbon
O.3
sp3 oxygen
C.ar
aromatic carbon
O.2
sp2 oxygen
C.cat
carbocation (e.g. in guanidinium)
O.co2
carboxylate/phosphate oxygen
N.3
sp3 nitrogen
S.3
sp3 sulphur
N.2
sp2 nitrogen
S.2
sp2 sulphur
N.1
sp nitrogen
S.o
sulphoxide sulphur
N.ar
aromatic nitrogen
S.o2
sulphone sulphur
N.am
amide/peptide nitrogen
P.3
phosphorus, e.g in phosphate
Sybyl atom types in Reliscript are not always set reliably, especially for atoms whose protonation
state is uncertain.
tuples
Python tuples are container (i.e. storage) objects, i.e. a tuple object contains within it a collection of
other objects. Tuples can be created by specifying the required objects enclosed in () brackets, e.g.
(1,2,3,4) creates a tuple containing four integers. Tuples are like lists (see page 80) as far as data
access goes, e.g.
# Create tuple
days_of_week = ('mon','tue','wed','thu','fri','sat','sun')
# The next line returns 'mon'
days_of_week[0]
Tuples differ from lists in that they cannot be modified. Therefore, tuples have all the same functions
as lists for data access - all those above the dotted line in the section on lists (see page 80) - but no
functions that modify the tuple contents.
Reliscript User Guide
83
types
Python supports basic types including integers, floats and string. Strings can be delimited by
single(‘), double (“) or triple (“““) quotes. Triple quotes are useful in that they can cover more than
one line. For example:
Nursery_rhyme = “““Mary had a little lamb
Its fleece was white as snow
And everywhere that Mary went
Her lamb was sure to go“““
while
The while command continues to execute a statement until a condition is no longer satisfied, for an
example see the script while_example.py.
84
Reliscript User Guide
12 Appendix B: List of Commands, Attributes, Functions, Parameters and Operators
+
operator applicable to sets (see
Section 5.2.6, page 57)
-
operator applicable to sets (see
Section 5.2.6, page 57)
&
operator applicable to sets (see
Section 5.2.6, page 57)
|
operator applicable to sets (see
Section 5.2.6, page 57)
^
operator applicable to sets (see
Section 5.2.6, page 57)
a
PDB attribute (see Section 4.1.3,
page 24)
adjacent_chains
Ligand attribute (see Section 4.4.3,
page 34)
adjacent_ligands
Chain attribute (see Section 4.2.3,
page 28)
adjacent_ligands
NucleicAcid attribute (see Section
4.3.3, page 31)
adjacent_nucleic_acids
Ligand attribute (see Section 4.4.3,
page 34)
align_identity
sequence_search parameter (see
Section 6.3.1, page 63)
all_models
smiles_search or smarts_search
parameter (see Section 6.5.2, page
69)
alpha
PDB attribute (see Section 4.1.3,
page 24)
append
Set function (see Section 5.2.9,
page 60)
atoms
BindingSite attribute (see Section
4.9.3, page 49)
Reliscript User Guide
85
86
atoms
Bond attribute (see Section 4.8.3,
page 47)
atoms
Chain attribute (see Section 4.2.3,
page 28)
atoms
Ligand attribute (see Section 4.4.3,
page 34)
atoms
NucleicAcid attribute (see Section
4.3.3, page 31)
atoms
PackBindingSite attribute (see
Section 4.10.3, page 53)
atoms
PDB attribute (see Section 4.1.3,
page 24)
atoms
Residue attribute (see Section 4.6.3,
page 41)
atoms
Solvent attribute (see Section 4.5.3,
page 38)
attribute
text_search parameter (see Section
6.1.1, page 61)
attribute_name
consensus_search parameter (see
Section 6.4.1, page 65)
attribute_name
sequence_search parameter (see
Section 6.3.1, page 63)
attribute_name
smiles_ or smarts_search parameter
(see Section 6.5.2, page 69)
attribute_name
similar_ligand_search parameter
(see Section 6.6.1, page 70)
attributes
text_search parameter (see Section
6.1.1, page 61)
author
PDB attribute (see Section 4.1.3,
page 24)
authors
PDB attribute (see Section 4.1.3,
page 24)
b
PDB attribute (see Section 4.1.3,
page 24)
Reliscript User Guide
b_factor
Atom attribute (see Section 4.7.3,
page 45)
base_operation_class
Base class in Reliscript (see Section
8.1, page 74)
beta
PDB attribute (see Section 4.1.3,
page 24)
binding_site
Ligand attribute (see Section 4.4.3,
page 34)
binding_site
PackBindingSite attribute (see
Section 4.10.3, page 53)
binding_sites
PDB attribute (see Section 4.1.3,
page 24)
bonds
Atom attribute (see Section 4.7.3,
page 45)
bonds
BindingSite attribute (see Section
4.9.3, page 49)
bonds
Chain Attribute (see Section 4.2.3,
page 28)
bonds
Ligand attribute (see Section 4.4.3,
page 34)
bonds
NucleicAcid attribute (see Section
4.3.3, page 31)
bonds
PackBindingSite attribute (see
Section 4.10.3, page 53)
bonds
PDB attribute (see Section 4.1.3,
page 24)
bonds
Residue attribute (see Section
4.6.3, page 41)
bonds
Solvent attribute (see Section 4.5.3,
page 38)
bond_type
Bond attribute (see Section 4.8.3,
page 47)
bound_ligand
BindingSite attribute (see Section
4.9.3, page 49)
Reliscript User Guide
87
88
bound_ligand
PackBindingSite attribute (see
Section 4.10.3, page 53)
c
PDB attribute (see Section 4.1.3,
page 24)
case
text_search parameter (see Section
6.1.1, page 61)
chain
Set type (see Section 5.2.1, page 55)
chain_id
Chain attribute (see Section 4.2.3,
page 28)
chain_id
NucleicAcid attribute (see Section
4.3.3, page 31)
chain_id
Residue attribute (see Section 4.6.3,
page 41)
chains
BindingSite attribute (see Section
4.9.3, page 49)
chains
PackBindingSite attribute (see
Section 4.10.3, page 53)
chains
PDB attribute (see Section 4.1.3,
page 24)
clear_transform
BindingSite function (see Section
4.9.4, page 50)
clear_transform
Chain function (see Section 4.2.4,
page 29)
clear_transform
Ligand function (see Section 4.4.4,
page 36)
clear_transform
NucleicAcid function (see Section
4.3.4, page 32)
clear_transform
PackBindingSite function (see
Section 4.10.4, page 53)
clear_transform
PDB function (see Section 4.1.4,
page 26)
clear_transform
Residue function (see Section 4.6.4,
page 42)
Reliscript User Guide
clear_transform
Solvent function (see Section 4.5.4,
page 39)
cofactor
Ligand attribute (see Section 4.4.3,
page 34)
component
numeric_search parameter (see
Section 6.2.1, page 62)
component
text_search parameter (see Section
6.1.1, page 61)
components
text_search parameter (see Section
6.1.1, page 61)
compound
PDB attribute (see Section 4.1.3,
page 24)
compound_name
Ligand attribute (see Section 4.4.3,
page 34)
consensus_search
Attribute created by
consensus_search (see Section
6.4.2, page 65)
consensus_search
Reliscript operation object (see
Section 6.4.1, page 65)
coords
Atom attribute (see Section 4.7.3,
page 45)
copy
Set function (see Section 5.2.3,
page 56)
covalently_bound
Ligand attribute (see Section 4.4.3,
page 34)
create
Reliscript command (see Section
4.1.1, page 23)
crystal
PDB attribute (see Section 4.1.3,
page 24)
date
PDB attribute (see Section 4.1.3,
page 24)
del
Internal function of set (see Section
5.2.7, page 58)
Reliscript User Guide
89
90
distance
Reliscript command (see Section
7.2, page 72)
element_no
Atom attribute (see Section 4.7.3,
page 45)
elif
Python command (see if, page 80)
else
Python command (see if, page 80)
exptl_method
PDB attribute (see Section 4.1.3,
page 24)
extend
Set function (see Section 5.2.9,
page 60)
field
text_search parameter (see Section
6.1.1, page 61)
fields
text_search parameter (see Section
6.1.1, page 61)
filter
Customised operation class function
(see Section 8.2, page 74)
filter_object_type
Customised operation class function
(see Section 8.2, page 74)
for
Python command (see for, page 79)
full_name
Ligand attribute (see Section 4.4.3,
page 34)
gamma
PDB attribute (see Section 4.1.3,
page 24)
header
PDB attribute (see Section 4.1.3,
page 24)
hitlists
Reliscript command (see Section
7.3, page 73)
homology
sequence_similarity attribute key
(see Section 6.3.2, page 64)
if
Python command (see if, page 80)
import
Python command (see Section
3.1.4, page 8)
Reliscript User Guide
index_no
Atom attribute (see Section 4.7.3,
page 45)
index_no
Residue attribute (see Section 4.6.3,
page 41)
len
Internal function of Chain (see
Section 4.2.5, page 30)
len
Internal function of Set (see
Section 5.2.7, page 58)
ligand
superimpose_chain parameter (see
Section 6.7.1, page 71)
ligand
Set type (see Section 5.2.1, page 55)
ligand_similarity
Attribute created by
similar_ligand_search (see Section
6.6.2, page 70)
ligands
BindingSite attribute (see Section
4.9.3, page 49)
ligands
PackBindingSite attribute (see
Section 4.10.3, page 53)
ligands
PDB attribute (see Section 4.1.3,
page 24)
locations
consensus_search attribute key (see
Section 6.4.2, page 65)
manipulate
Customised operation class function
(see Section 8.2, page 74)
manipulate_object_type
Customised operation class function
(see Section 8.2, page 74)
max
numeric_search parameter (see
Section 6.2.1, page 62)
max_distance
Reliscript command (see Section
7.2, page 72)
maxidentity
sequence_search parameter (see
Section 6.3.1, page 63)
maxtani
similar_ligand_search parameter
(see Section 6.6.1, page 70)
Reliscript User Guide
91
92
min
numeric_search parameter (see
Section 6.2.1, page 62)
minidentity
sequence_search parameter (see
Section 6.3.1, page 63)
mintani
similar_ligand_search parameter
(see Section 6.6.1, page 70)
mol_wt
Ligand attribute (see Section 4.4.3,
page 34)
n_atom
Chain attribute (see Section 4.2.3,
page 28)
n_atom
Ligand attribute (see Section 4.4.3,
page 34)
n_atom
NucleicAcid attribute (see Section
4.3.3, page 31)
n_atom
Residue attribute (see Section 4.6.3,
page 41)
n_atom
Solvent attribute (see Section 4.5.3,
page 38)
n_atom_ideal
Residue attribute (see Section 4.6.3,
page 41)
n_unit
Chain attribute (see Section 4.2.3,
page 28)
n_unit
Ligand attribute (see Section 4.4.3,
page 34)
n_unit
NucleicAcid attribute (see Section
4.3.3, page 31)
n_unit
Solvent attribute (see Section 4.5.3,
page 38)
name
Atom attribute (see Section 4.7.3,
page 45)
name
Residue attribute (see Section 4.6.3,
page 41)
nucleic_acid
Set type (see Section 5.2.1, page 55)
Reliscript User Guide
nucleic_acids
BindingSite attribute (see Section
4.9.3, page 49)
nucleic_acids
PackBindingSite attribute (see
Section 4.10.3, page 53)
nucleic_acids
PDB attribute (see Section 4.1.3,
page 24)
number
Atom attribute (see Section 4.7.3,
page 45)
numeric_search
Reliscript operation object (see
Section 6.2.1, page 62)
occupancy
Atom attribute (see Section 4.7.3,
page 45)
one_letter_code
Residue attribute (see Section 4.6.3,
page 41)
other_atom
Bond function (see Section 4.8.4,
page 47)
pack_binding_site
BindingSite attribute (see Section
4.9.3, page 49)
pack_binding_site
Ligand attribute (see Section 4.4.3,
page 34)
pack
_binding_sites PDB attribute (see
Section 4.1.3, page 24)
pdb
Set type (see Section 5.2.1, page 55)
pdb
Atom attribute (see Section 4.7.3,
page 45)
pdb
BindingSite attribute (see Section
4.9.3, page 49)
pdb
Chain attribute (see Section 4.2.3,
page 28)
pdb
Ligand attribute (see Section 4.4.3,
page 34)
pdb
NucleicAcid attribute (see Section
4.3.3, page 31)
Reliscript User Guide
93
94
pdb
PackBindingSite attribute (see
Section 4.10.3, page 53)
pdb
Residue attribute (see Section 4.6.3,
page 41)
pdb
Solvent attribute (see Section 4.5.3,
page 38)
pdb_atoms
BindingSite function (see Section
4.9.4, page 50)
pdb_atoms
Chain function (see Section 4.2.4,
page 29)
pdb_atoms
Ligand function (see Section 4.4.4,
page 36)
pdb_atoms
NucleicAcid function (see Section
4.3.4, page 32)
pdb_atoms
PackBindingSite function (see
Section 4.10.4, page 53)
pdb_atoms
PDB function (see Section 4.1.4,
page 26)
pdb_atoms
Residue function (see Section 4.6.4,
page 42)
pdb_atoms
Solvent function (see Section 4.5.4,
page 39)
pdb_line
Atom function (see Section 4.7.4,
page 46)
peptide
Ligand attribute (see Section 4.4.3,
page 34)
ph
PDB attribute (see Section 4.1.3,
page 24)
pure_peptide
Ligand attribute (see Section 4.4.3,
page 34)
r_value
PDB attribute (see Section 4.1.3,
page 24)
re
Python module (see Section 3.1.4,
page 8)
Reliscript User Guide
residue
Atom attribute (see Section 4.7.3,
page 45)
residues
Chain attribute (see Section 4.2.3,
page 28)
residues
Ligand attribute (see Section 4.4.3,
page 34)
residues
NucleicAcid attribute (see Section
4.3.3, page 31)
residues
Solvent attribute (see Section 4.5.3,
page 38)
resolution
PDB attribute (see Section 4.1.3,
page 24)
reverse
Set function (see Section 5.2.8,
page 58)
save
Set function (see Section 5.2.4,
page 56)
save_to_hitlist
Set function (see Section 5.2.4,
page 56)
save_mol2
BindingSite function (see Section
4.9.4, page 50)
save_mol2
Ligand function (see Section 4.4.4,
page 36)
save_pdb
BindingSite function (see Section
4.9.4, page 50)
save_pdb
Chain function (see Section 4.2.4,
page 29)
save_pdb
Ligand function (see Section 4.4.4,
page 36)
save_pdb
NucleicAcid function (see Section
4.3.4, page 32)
save_pdb
PackBindingSite function (see
Section 4.10.4, page 53)
save_pdb
PDB function (see Section 4.1.4,
page 26)
Reliscript User Guide
95
96
save_pdb
Residue function (see Section 4.6.4,
page 42)
save_pdb
Solvent function (see Section 4.5.4,
page 39)
score
sequence_similarity attribute key
(see Section 6.3.2, page 64)
sequence
Chain attribute (see Section 4.2.3,
page 28)
sequence_3d
Chain attribute (see Section 4.2.3,
page 28)
sequence_3d
NucleicAcid attribute (see Section
4.3.3, page 31)
sequence_no
Residue attribute (see Section 4.6.3,
page 41)
sequence_search
Reliscript operation object (see
Section 6.3.1, page 63)
sequence_similarity
Attribute created by
sequence_search (see Section 6.3.2,
page 64)
set
Reliscript command (see Section
5.2.2, page 55)
similar_ligand_search
Reliscript operation object (see
Section 6.6.1, page 70)
smarts_hit_data
Attribute created by smarts_search
(see Section 6.5.1, page 68)
smarts_search
Reliscript operation object (see
Section 6.5.1, page 68)
smiles_hit_data
Attribute created by smiles_search
(see Section 6.5.1, page 68)
smiles_search
Reliscript operation object (see
Section 6.5.1, page 68)
solvent
Set type (see Section 5.2.1, page 55)
solvent
BindingSite attribute (see Section
4.9.3, page 49)
Reliscript User Guide
solvent
PackBindingSite attribute (see
Section 4.10.3, page 53)
solvent
PDB attribute (see Section 4.1.3,
page 24)
sort
Set function (see Section 5.2.8,
page 58)
source
PDB attribute (see Section 4.1.3,
page 24)
space_group
PDB attribute (see Section 4.1.3,
page 24)
store_match
smiles_search or smarts_search
parameter (see Section 6.5.2, page
69)
sugar
Ligand attribute (see Section 4.4.3,
page 34)
superimpose_chain
Reliscript operation object (see
Section 6.7.1, page 71)
sybyl_type
Atom attribute (see Section 4.7.3,
page 45)
symbol
Atom attribute (see Section 4.7.3,
page 45)
temp
PDB attribute (see Section 4.1.3,
page 24)
text_search
Reliscript operation object (see
Section 6.1.1, page 61)
title
PDB attribute (see Section 4.1.3,
page 24)
transform
BindingSite function (see Section
4.9.4, page 50)
transform
Chain function (see Section 4.2.4,
page 29)
transform
Ligand function (see Section 4.4.4,
page 36)
Reliscript User Guide
97
98
transform
NucleicAcid function (see Section
4.3.4, page 32)
transform
PackBindingSite function (see
Section 4.10.4, page 53)
transform
PDB function (see Section 4.1.4,
page 26)
transform
Residue function (see Section 4.6.4,
page 42)
transform
Solvent function (see Section 4.5.4,
page 39)
type
Chain attribute (see Section 4.2.3,
page 28)
type
Ligand attribute (see Section 4.4.3,
page 34)
type
NucleicAcid attribute (see Section
4.3.3, page 31)
type
Residue attribute (see Section 4.6.3,
page 41)
type
Solvent attribute (see Section 4.5.3,
page 38)
type
text_search parameter (see Section
6.1.1, page 61)
use_filter
Customised operation class function
(see Section 8.2, page 74)
use_manipulate
Customised operation class function
(see Section 8.2, page 74)
use_workspace
Reliscript command (see Section
7.4, page 73)
value
ligand_similarity attribute key (see
Section 6.6.2, page 70)
while
Python command (see while, page
84)
x
Atom attribute (see Section 4.7.3,
page 45)
Reliscript User Guide
y
Atom attribute (see Section 4.7.3,
page 45)
year
PDB attribute (see Section 4.1.3,
page 24)
z
Atom attribute (see Section 4.7.3,
page 45)
z_value
PDB attribute (see Section 4.1.3,
page 24)
Reliscript User Guide
99
100
Reliscript User Guide
13 Appendix C: Reliscript Tutorials
13.1 Tutorial 1: Finding and Classifying Contacts to Ligand Carboxylate Groups
13.1.1 Objectives
• To illustrate basic use of the Python interpreter.
• To show how a script can be written to identify protein-bound ligands containing carboxylate
groups, and then extended to identify carboxylate groups that form unusual patterns of
nonbonded contacts.
13.1.2 The Example Problem
CCDC distributes and develops the protein-ligand docking program GOLD. Like most good docking
programs, GOLD usually makes reliable predictions but sometimes produces a questionable result. In
testing the program, we noticed a case where it had docked a carboxylate-containing ligand in such a
way that the oxygen atoms were in a hydrophobic environment and formed close contacts to a
backbone carbonyl oxygen (Figs. 1-3). GOLD is deliberately parameterised to allow interatomic
contacts that are slightly too short (this compensates for the fact that the protein is not allowed to
flex). We were therefore not concerned to see contact distances in the region of 2.6Å. However, the
nature of these contacts - viz. to hydrophobic carbons and the electronegative carbonyl oxygen - was a
concern. We would obviously expect a carboxylate group to form contacts to H-bond donors and/or
be solvent exposed. GOLD did, in fact, produce an alternative solution in which the carboxylate
group was solvent exposed. We wondered whether the solution shown in the figures is sufficiently
unlikely that it should be rejected automatically.
Is there a precedent in the PDB for such a carboxylate-group environment? Gohlke et al. (J. Mol.
Biol., 295, 337-356, 2000) mention that PDB entry 1ICN has a buried ligand carboxylate, but
comment that the ligand is disordered and the electron density is somewhat ambiguous. We are
unaware of any systematic survey of the environments of ligand carboxylates.
In this tutorial, we analyse the contacts made by ligand carboxylate oxygen atoms in order to assess
whether it is so unlikely for a carboxylate group to bind in a largely non-polar environment that
docking solutions containing such a feature should be filtered out.
Reliscript User Guide
101
Fig. 1. Docked ligand (carbon atoms in green), showing carboxylate group (top right of ligand)
forming apparently unfavourable contacts.
Fig. 2. As above, in space filling style.
102
Reliscript User Guide
Fig. 3. Close-up of docked carboxylate showing close contacts.
13.1.3 Is Relibase+ or Reliscript the Most Suitable Tool?
By exploiting the 3D search capabilities of Relibase+, we can easily find carboxylates forming a
particular pattern of contacts to hydrophobic atoms. For example, we could find all ligand
carboxylates forming two or more contacts to protein carbon atoms less than, say, 3.2Å. The
disadvantage is that we have to specify in advance what pattern of contacts we are looking for. It
would be better if we could analyse all carboxylate-group environments in order to determine the
percentage of environments that are non-polar and/or involve close contacts to H-bond acceptor
atoms. Reliscript is well suited to this task and offers us great flexibility in how we analyse the
results.
13.1.4 Assumed Starting Point
It is assumed that:
• Relibase+, Python and Reliscript are installed
• The Reliscript environment has been set-up (see Section 3.1.1, page 6)
13.1.5 Creating a Hitlist for Debugging Purposes
Because searches can take some time when performed on the whole database the first step will be to
set up a hitlist representing a subset of the protein-ligand complexes in Relibase.
Reliscript User Guide
103
1. Read tutorial1_hitlist.py.
• Read through the tutorial script, tutorial1_hitlist.py
• This script filters out ligands with a molecular weight in the range of 300 to 500 and saves
them to a hitlist named tutorial1
2. Run the script tutorial1_hitlist.py.
• Type in the following on the command line:
% python tutorial1_hitlist.py
• Note that if you try and run this script twice it will produce an error as the hitlist already exists
13.1.6 Part a: SMILES Searching and Basic Use of the Python Interpreter
1. Open the Python interpreter.
• Type python in the terminal. This should result in something like:
bash-3.1.17$ python
Python 2.5.2 (r252:60911, Jul 23 2008, 17:11:49)
[GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-59)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
2. Import Reliscript.
• Type the following command at the Python >>> prompt to import the reliscript module
and alias it to rs:
>>> import reliscript as rs
• This should produce output looking something like:
Starting the JVM
-Xms128m
-Xmx512m
-Xmn64m
Imported psyco for python speed optimization
>>>
104
Reliscript User Guide
3. Perform a SMILES search for carboxylate groups.
• Create a set containing all ligands in the tutorial1 hitlist; the relevant command is:
>>> ligset = rs.set(’ligand’, ’tutorial1’)
• Create a smiles_search object that will find carboxylate groups:
>>> co2_search = rs.smarts_search(’C(=O)[OH]’)
• The use of [OH] in the SMILES string ensures that the search will find carboxylates but not
esters. Hydrogen atoms, of course, are not usually present in PDB structures. In Relibase+ and
Reliscript, an H-count in a SMILES string is used to specify valencies that must remain unfilled.
In the present case, C(=O)[OH] will find C(=O)O- but will not find an ester group such as
C(=O)OCH3.
• Apply the smiles_search object to the ligand set by typing ligset(co2_search) at
the Python prompt. This will cause the SMILES search to be performed.
>>> ligset(co2_search)
reading data.....ok.
>>>
• ligset now contains only those ligands that have carboxylate groups. Type len(ligset)
to find out how many of these ligands there are:
>>> len(ligset)
1432
>>>
• Print the name of the first ligand containing a carboxylate (this will be ligset[0] because
Python indexing begins at zero, not one):
>>> print ligset[0]
Ligand<pdb:1sln:INH_256>
>>>
Reliscript User Guide
105
4. Exit Python and then re-run the SMILES search using the prepared script tutorial1a.py.
• Just to illustrate another Python feature, exit the current Python session by typing Ctrl-D. This
will return you to the Unix prompt.
• Now look at the contents of the first tutorial script tutorial1a.py. This file contains the
Python commands that you have just run interactively.
• Python can be opened, and this file of commands run automatically, by using the -i commandline option, i.e. by typing python -i tutorial1a.py at the terminal:
% python -i tutorial1a.py
Starting the JVM
-Xms128m
-Xmx512m
-Xmn64m
Imported psyco for python speed optimization
Loading catalog .||.|.|.|.. complete
Elapsed time for 'com_sub2d_impl' = 1.56 seconds.
IPC_OUT::/tmp/reli14334
NUMBER_OF_HITS::26434
:IPC_END:
>>>
• You are left in the Python interpreter, so can continue as if you had just typed the commands in
manually. For example, if you type len(ligset) you will get:
>>> len(ligset)
1432
>>>
• Read through tutorial1a.py. Note that for the sake of speed we are only using the ligand
entries in the tutorial1 hitlist.
5. Use Python in background mode.
• Using Python interactively is excellent when you are writing and debugging scripts. Once a debugged script is produced, it is usually easier to run jobs in the background and re-direct output
to a file. This is what we will do in the remainder of the tutorial.
• For example, we can run tutorial1a.py as a background job, redirecting the output to a
new file called example.out, by typing the following at the Unix prompt:
python tutorial1a.py >example.out &
106
Reliscript User Guide
13.1.7 Part b: Find and Print Out Binding-Site Atoms in Contact with Carboxylate Oxygens
1. Read tutorial1b.py.
• Read through the next version of the tutorial script, tutorial1b.py, with the help of the
notes that follow.
2. Understand how the script accesses carboxylate-group atoms.
• The first few lines of tutorial1b.py simply repeat what we did in Part a, i.e. set up and
run a search for ligands containing carboxylate groups. The resulting ligand set, ligset,
contains the ligands found by the SMARTS search. Each ligand in ligset must contain at
least one carboxylate group but may contain more than one.
• We need to know which atoms in the ligand correspond to the carboxylate-group atoms.
When the SMARTS search ran, it created a new attribute called smarts_hit_data for
every ligand that satisfied the search. Suppose that lig is a ligand object in ligset.
lig.smarts_hit_data[0][0] contains the carbon atom of the first carboxylate group
found in lig. The oxygen atoms will be in lig.smarts_hit_data[0][1] and
lig.smarts_hit_data[0][2]. The order of atoms in smarts_hit_data is the
same as the order of the atoms in the SMARTS string we specified (viz. ‘C(=O)[OH]’). If
lig contains >1 carboxylate group, the atoms of the second carboxylate will be in
lig.smarts_hit_data[1][j], where j = 0, 1 and 2; and so on.
• The following code in tutorial1b.py therefore loops through all the carboxylate atoms
and prints out their names and index numbers:
3. Understand how the script finds contacts to atoms in the binding site.
• In the above section of code, we retrieved the binding site of each ligand as bs. The final few
lines of code in tutorial1b.py loop through all the atoms in bs. Each is tested to see
whether it is within 3.2Å of the carboxylate oxygen and is not a hydrogen. If so, its details are
printed out:
4. Run tutorial1b.py.
• At the Unix prompt, run tutorial1b.py as a background job, redirecting the output to
part1b.out. This should look something like:
% python tutorial1b.py >part1b.out &
[1] 251339
• The job should take a few minutes to run. When it is done, type more part1b.out to see
the first few lines that the script has produced. It should look something like:
Reliscript User Guide
107
Starting the JVM
-Xms128m
-Xmx512m
-Xmn64m
Importing psyco for python speed optimization
Fast interchange file = /local/shields/relibase/python/reliscript/
reliscript_fast_lookup_54998.py
.... complete
Catalog successfully loaded from fast lookup (54998 entries)
IPC_OUT::/tmp/reli37035
ligand: Ligand<pdb:1sln:INH_256>
contacts to atom: O5 index no. = 4
CD2 HIS
NE2 HIS
NE2 HIS
NE2 HIS
ZN ZN
contacts to atom: O4 index no. = 3
CE1 HIS
ZN ZN
O HOH
ligand: Ligand<pdb:2vmy:FFO_505-A>
contacts to atom: O1 index no. = 30
None
contacts to atom: O2 index no. = 31
NZ LYS
ligand: Ligand<pdb:2vmy:FFO_505-A>
contacts to atom: OE2 index no. = 28
None
contacts to atom: OE1 index no. = 27
OG SER
ligand: Ligand<pdb:2vmy:FFO_505-B>
contacts to atom: O1 index no. = 30
None
contacts to atom: O2 index no. = 31
OH TYR
108
Reliscript User Guide
• The output lists all the binding-site atoms in contact with the oxygen atoms of every ligand
carboxylate group in the test database. In principle, this is what we need to answer the
scientific problem at hand. Browsing through the output shows that most of the contact atoms
are H-bond donors such as water O, lysine NZ, arginine NE, NH1 and NH2, etc., as we would
expect. However, it is clear that manual analysis of the output would be very tedious, so we
need to enhance the script.
13.1.8 Part c: Classify Atoms that Form Contacts to Carboxylate Oxygens
1. Read tutorial1c.py.
• Read through the next version of the tutorial script, tutorial1c.py, with the help of the
notes that follow.
2. Understand the strategy of tutorial1c.py.
• The idea behind the enhanced script is to determine the nature of every binding-site atom in
contact with a carboxylate oxygen. Each contact atom is classified as a hydrogen-bond donor,
a metal ion, a hydrophobic atom, or a hydrogen-bond acceptor that is not also a hydrogenbond donor. If an atom cannot be assigned to one of these categories, it is classified as
unknown. It is then possible to count and print out the number of good and bad contacts each
pair of carboxylate oxygen atoms makes. Contacts to H-bond donors and metals ions are
likely to be energetically favourable, so are classified as good. Contacts to hydrophobes or Hbond acceptors which are not also H-bond donors are bad.
3. Understand how the script classifies contact atoms.
• The script begins with four functions for classifying contact atoms. All four functions take as
their input argument the contact-atom atom object. They return true or false, depending
on whether or not the atom is of a particular type.
• The first function in the code, is_hydrophobe, determines whether the atom is
hydrophobic by testing if its element symbol is C or S:
• The second function, is_metal, works exactly the same way, i.e. tests to see if the element
symbol of the input atom corresponds to a metal.
• The third function, is_donor, tests the atom name and the name of the parent residue to see
if the atom is a recognised protein H-bond donor, e.g. NZ of lysine. Water donors are also
detected but donors in other non-peptidic entities (e.g. cofactors) will not be recognised.
• The final function, is_acceptor, again uses atom and residue name information to
determine whether the atom is an H-bond acceptor that is not also an H-bond donor.
4. Understand how the script counts good and bad contacts.
• In the main program, the classification functions are called for each atom found to be in
Reliscript User Guide
109
contact with a carboxylate oxygen. If a contact atom is found to be an H-bond donor or metal
ion, the count n_good is incremented. If it is found to be a hydrophobe or an H-bond
acceptor, n_bad is incremented. The counts of good and bad contacts (and unrecognised
contacts) are printed out for each carboxylate group, see the code extract from
tutorial1c.py.
5. Run tutorial1c.py.
• At the Unix prompt, run tutorial1c.py as a background job, redirecting the output to
part1c.out. This should look something like:
% python tutorial1c.py >part1c.out &
[1] 251305
• The job should take a few minutes to run. When it is done, more the first few lines of
part1c.out to see what the script has produced. It should look something like:
good
good
good
good
good
good
good
=
=
=
=
=
=
=
2
1
3
1
1
1
1
bad
bad
bad
bad
bad
bad
bad
=
=
=
=
=
=
=
2
0
0
1
0
0
0
unknown
unknown
unknown
unknown
unknown
unknown
unknown
=
=
=
=
=
=
=
4
0
0
0
0
1
0
• The output is an improvement on the previous version of the script since it shows at a glance
how many good and bad contacts are formed by each carboxylate group. However, it would
still be tedious to analyse it manually so further code is developed in the next part of the
tutorial.
13.1.9 Part d: Find Frequencies of Occurrence of Carboxylate-Group Environments
1. Read tutorial1d.py.
• Read through the next version of the tutorial script, tutorial1d.py, with the help of the
notes that follow.
2. Understand the strategy of tutorial1d.py.
• This version of the script is the same as the previous version except that it keeps track of how
many different combinations of n_good, n_bad and n_unknown occur. For example,
suppose that there were only five carboxylate groups in the set and they had the following
110
Reliscript User Guide
contact counts:
good
good
good
good
good
=
=
=
=
=
3
5
2
2
5
bad
bad
bad
bad
bad
=
=
=
=
=
0
1
2
2
1
unknown
unknown
unknown
unknown
unknown
=
=
=
=
=
0
0
0
0
0
occurrences = 1 percentage = 0.0443852640923
• In this simple example, there are two occurrences of the combination n_good = 5,
n_bad = 1, n_unknown = 0; two occurrences of n_good = 2, n_bad = 2,
n_unknown = 0; and one of n_good = 3, n_bad = 0, n_unknown = 0.
• The various combinations are sorted so that those containing the most good contacts occur
first and are then printed out.
3. Understand how the script uses a dictionary to store the unique combinations of contact-atom
counts.
• The script uses a Python dictionary to do the book-keeping. This is initialised with the
statement:
# Initialise dictionary that will be used to store
# the results
count_combo = {}
• What is called a dictionary in Python is called an associative array in some other languages. It
is an array of items, each of which is associated with a key, and which can be accessed via that
key.
• Every time the values of n_good, n_bad and n_count are evaluated for a carboxylate
group in tutorial1d.py, they are used to generate a key. The line of code is:
key = 10000*n_good + 100*n_bad + n_unknown
• The dictionary count_combo is checked to see whether it already contains that key. If so,
this particular combination of n_good, n_bad, n_unknown has already been found in a
previous carboxylate group and all we need do is increment its occurrence-count by one. If
not, this is the first time this particular combination has been seen, so a new item is added to
the dictionary, initialised with an occurrence-count of 1, see the code extract from
tutorial1d.py.
Reliscript User Guide
111
4. Understand how the script sorts and prints out the results.
• All that remains to be done at the end is to convert the dictionary to a list, sort the list and then
reverse the order. This will place all the unique combinations of contact-atom counts in
descending order of their key values, which will effectively mean that they are sorted first on
n_good, then on n_bad and then on n_unknown.
results = count_combo.items()
results.sort()
results.reverse()
• Then the results are printed out, see the code extract from tutorial1d.py:
5. Run tutorial1d.py.
• At the Unix prompt, run tutorial1d.py as a background job, redirecting the output to
part1d.out. This should look something like:
% python tutorial1d.py >part1d.out &
[1] 251435
• The job should take a few minutes to run. Once it is done, cat part1d.out to see what
the script has produced. The first few lines of the file should look something like:
good = 9 bad = 2 unknown = 0
occurrences = 1 percentage = 0.0443852640923
good = 8 bad = 4 unknown = 0
occurrences = 1 percentage = 0.0443852640923
good = 8 bad = 1 unknown = 0
occurrences = 1 percentage = 0.0443852640923
good = 7 bad = 1 unknown = 0
occurrences = 2 percentage = 0.0887705281846
good = 7 bad = 0 unknown = 0
occurrences = 7 percentage = 0.310696848646
good = 6 bad = 3 unknown = 0
occurrences = 4 percentage = 0.177541056369
112
Reliscript User Guide
• This is the first version of the script that provides a direct answer to the scientific problem
under investigation. In particular, we see a few contact counts that look distinctly
unfavourable (e.g. 1 good, 3 bad), although they occur with low frequency. An obvious need
now is to get details of those carboxylate groups that are in unfavourable environments, so
that we can inspect them manually in Relibase+. It would also be nice to get an overall
percentage of carboxylate groups that are in unfavourable environments. The next and final
version of the script addresses these requirements.
13.1.10Part e: Find and Print Details of Carboxylate Groups in Unusual Environments
1. Read tutorial1e.py.
• Read through the final version of the tutorial script, tutorial1e.py, with the help of the
notes that follow.
2. Understand the strategy of tutorial1e.py.
• This final version of the script does everything that the previous version did, but in addition
keeps track of how many carboxylate groups occur in unfavourable environments.
• We have to define what unfavourable means. One complication is that any carboxylate group
forming less than 4 contacts in total is probably at least partly exposed to bulk solvent. If the
total number of contacts is n, we crudely allow for this by assuming that 4-n contacts (=
n_assumed) are to bulk water (and therefore inherently favourable). We then define as
unfavourable any carboxylate group environment for which the following two conditions are
true: n_good < n_bad (i.e. the group is observed to form more contacts to hydrophobes
and H-bond acceptors than to H-bond donors and metal ions); and n_good + n_assumed
< 4 (i.e. there are less than four favourable contacts, either explicitly observed or assumed
contacts to bulk water).
• In addition, the script prints details of the fifty carboxylate groups occurring in the worst
environments. This is done by calculating and sorting on the quantity n_bad - n_good n_assumed (roughly, number of unfavourable contacts minus number of favourable
contacts).
3. Understand how the script identifies and counts carboxylate groups in unexpectedly
unfavourable environments.
• The count of groups in unfavourable environments is initialised:
n_unexpected =0
• When each carboxylate group in an unfavourable environment is identified, the count
Reliscript User Guide
113
incremented, see the code extract from tutorial1e.py.
4. Understand how the script identifies the fifty carboxylate groups in the worst environments:
• A list is initialised which will end up storing the details of the worst fifty carboxylates:
# Initialise list that will contain details of the
# carboxylates in the most unfavourable environments
worst = []
• As each group is processed, a parameter called how_bad is calculated. This is a crude index
of how unfavourable the group’s environment is. If the index is one of the largest fifty values
so far encountered, details of the group are put into worst, see the code extract from
tutorial1e.py.
• Finally, details of the groups are printed out at the end, see the code extract from tutorial1e.py.
5. Run tutorial1e.py.
• At the Unix prompt, run tutorial1e.py as a background job, redirecting the output to
part1e.out. The job should take a few minutes to run. Once it is done, cat
part1e.out to see what the script has produced. The file should look contain a line
something like:
Number of groups in unexpected environments = 44 percentage =
6.11961057024
• It should also contain details of the groups in the most unfavourable environments, e.g.
ligand = Ligand<pdb:3eo7:ACT_708-A>
carboxylate oxygens = OXT 2 and O 1
good = 1 bad = 3 unknown = 0
ligand = Ligand<pdb:1tvw:CB3_318>
carboxylate oxygens = OE1 27 and OE2 28
good = 0 bad = 3 unknown = 0
ligand = Ligand<pdb:3dl6:DHF_613-C>
carboxylate oxygens = O1 30 and O2 31
good = 1 bad = 3 unknown = 2
114
Reliscript User Guide
ligand = Ligand<pdb:2nph:AETF_1-S>
carboxylate oxygens = O 24 and OXT 32
good = 2 bad = 4 unknown = 0
• Only about 8% of carboxylate groups occur in unfavourable environments. We can look at
some of these groups in Relibase+. The last in the list above, 1GHB, shows a remarkable
interaction in which a ligand carboxylate appears to point directly at the face of a tyrosine
ring:
Fig. 4. Unusual carboxylate-group environment in 1GHB.
Reliscript User Guide
115
Fig.5. As above, interaction shown in space-filling style.
6. Generate results from the full database.
• So far we have used the small test database. This is ideal when developing and testing
scripts since it is large enough to be a meaningful test set but small enough that jobs typically
run in a few minutes. Now, however, we may wish to generate final results from the full
database, reli. This will take much longer - typically, several hours.
• The tutorial script can be run on the full database by removing the hitlist from the script, in
other words change the line:
ligset = rs.set(’ligand’, ’tutorial1’)
to:
ligset = rs.set(’ligand’)
13.1.11Scientific Conclusions
The questionable GOLD docking that prompted this study clearly falls well within the definition used
here of an unfavourable carboxylate-group environment. Only a small percentage of carboxylate
groups in the PDB are observed to occur in such environments (and some of these are probably due to
experimental errors in measuring or fitting electron density). We therefore conclude that the GOLD
solution is sufficiently unlikely that it, and others like it, could reasonably be rejected automatically
116
Reliscript User Guide
as false predictions.
13.1.12Ways of Improving the Tutorial Script
The tutorial script could be improved in many ways. For example, we could:
• Restrict the study to structures with a resolution better than 2.5Å.
• Eliminate disordered ligands by testing atom site occupation factors.
• Impose a minimum size (i.e. number of atoms) on the ligands, in case very small ligands are
atypical.
• Eliminate common cofactors in case they bias the results.
• Replace the fixed distance criterion used to define a short contact (3.2Å in the script) by a
criterion that varies according to the van der Waals radii of the contact atoms.
• Check for contacts to atoms from neighbouring chains in the crystal packing, using
PackBindingSite objects.
• Extend the functions is_donor and is_acceptor so that they reliably identify H-bond
donor and acceptor atoms in cofactors, etc.
• Create a Relibase+ hitlist of all the ligands containing carboxylate groups in unfavourable
environments, so that they may be inspected more easily in Relibase+.
• Perform tests on the directions of the short contacts to carboxylate oxygens, e.g. to identify
contacts to H-bond donor atoms that are not, in fact, hydrogen bonds because of poor
directionality.
All of these enhancements could be made using existing Reliscript functionality.
Reliscript User Guide
117
13.2 Tutorial 2: Creating a Binding Site Quality Checker
13.2.1 Objectives
To create a script that can be run from the command line that takes a PDB code as an argument and
checks the binding site for:
• Clashes between protein side-chains
• Clashes between the protein and the ligand
• Missing atoms
• Influences of symmetry related protein residues
• Atoms with high B-factors
• Atoms with low occupancy
13.2.2 Steps Required
• Create a module, my_rs_tools, containing functions for testing the quality of a binding site
• Create main script using functions defined in the module my_rs_tools
• Use python’s inbuilt optparse module for reading arguments and options from the command
line
• Use python’s raw_input function to allow the user to interact with the script
13.2.3 The Example
When performing docking experiments it is important that the quality of the binding site is optimal.
If, for example, there are atoms missing in the binding site docking programs will not know about
them and the results obtained will be flawed. Because this is a well recognized problem in docking
several studies have been aimed at creating high quality data sets for validating docking programs and
scoring functions (Nissink et al., Proteins, 49, 457-471, 2002; Hartshorn et al., J. Med. Chem., 50,
726-741, 2007; Verdonk et al., J. Chem. Inf. Model., in press 2008). These test sets have all been
tested for involvement of symmetry related protein side-chains in ligand binding, bad clashes
between the protein side-chains and the ligand, unlikely ligand conformations and inconsistencies of
the placement of the ligand in the electron densities.
Tests for bad clashes and involvement of symmetry related protein atoms in the ligand binding are
easily implemented in reliscript. Unlikely ligand conformations could easily be tested using the CSD
software Mogul, but will not be further treated here. Testing for inconsistencies of the placement of
the ligand in the electron densities is not currently possible with reliscript. However, using reliscript it
is easy to implement some other simple tests which give indications of the quality of the binding site,
such as highlighting atoms with unusually high B-factors and/or unusually low occupancies. We can
also further extend the notion of bad clashes from clashes between the protein and the ligand to
clashes between different protein side-chains.
118
Reliscript User Guide
13.2.4 Creating a Python Module
This task is a lot less daunting than it sounds. As a simple illustration copy the files my_module.py
and my_script.py onto your computer, making sure that they are in the same directory. Upon
inspection of the my_script.py file you will notice the line:
import my_module
This means that any classes and functions defined in my_module.py can be used in my_script.py
using the prefix my_module:
my_module.test()
The code in my_rstools1.py outlines the functions required for this tutorial (b_factor,
occupancy, missing_atoms, symmetry, clash) and illustrates several points about writing
code in python:
• Documentation of the module and the functions is done via docstrings. The module docstring is
the text within the triple quotes at the top of the module and the function docstrings are the text
within the triple quotations at the beginning of each function. These help provide documentation
for your program. Try running the command:
pydoc my_rstools
in the directory where you downloaded the my_rstools.py module.
• At the end of the module there is a script testing the functionality of the module. If the module is
run as a stand alone program the conditional statement:
if __name__ == ’__main__’:
evaluates as true and the code below is executed. Try running the command:
python my_rstools1.py
in the directory where you downloaded the my_rstools1.py module. It should give the following
output:
None
None
None
None
Reliscript User Guide
119
None
None
None
• Note that although none of the functions do anything useful, this is still a functioning script. The
next step will be to add the programming logic to these functions.
The code in my_rstools2.py contains the programming logic for the functions: b_factor,
occupancy, missing_atoms, symmetry. These represent the tests that are easily implemented
using the functionality inherent in reliscript. Note that strings can be easily formatted using a
convention similar to the C’s fprint function. For example the statement:
s = ’%s is equal to %.2f’ % (’x’, 1.23456)
will result in the variable x representing the string ’x is equal to 1.23’. Try running the
command:
python my_rstools2.py
in the directory where you downloaded the my_rstools2.py module. It should give the following
output:
False
Atom(N)<pdb:1mup:CHN-A:'5':1> has large b_factor 60.00 (> 40.00)
False
Atom(N)<pdb:1mup:CHN-A:'5':1> has occupancy 1.00 (< 2.00)
Residue<pdb:1mup:CHN-A:'5'> missing 4 atoms
There are 9 symmetry packed atoms in binding site
None
Finally, the code in my_rstools3.py implements the clash function. In order to check for bad
clashes we use the data from table II in Nissink et al., which reports the minimum distances for
selected atom-atom contacts. The data in the table is stored in a dictionary containing dictonaries
(_MINIMUM_DISTANCES), so that minimum distances can be queried using the following syntax:
cutoff = _MINIMUM_DISTANCES[atom1_atom_type][atom2_atom_type]
The problem is that the atoms types reported in the paper are in an E(n) format where E is the atom
and n is the number of atoms bonded to E, whereas Relibase+ uses the Sybyl atom types. A dictionary
converting mol2 Sybyl atom types to E(n) notation (_mo2_to_En_notation) is therefore implemented.
Finally a new function called bad_clash is implemented to automate the conversion of sybyl atom
120
Reliscript User Guide
types to E(n) notation, the minimum distance look up and to check whether the two atoms clash or
not. Have a look at the final version of the module my_rstools.py and try running it using the
command:
python my_rstools3.py
in the directory where you downloaded the my_rstools3.py module. It should give the following
output:
False
Atom(N)<pdb:1mup:CHN-A:'5':1> has large b_factor 60.00 (> 40.00)
False
Atom(N)<pdb:1mup:CHN-A:'5':1> has occupancy 1.00 (< 2.00)
Residue<pdb:1mup:CHN-A:'5'> missing 4 atoms
There are 9 symmetry packed atoms in binding site
False
13.2.5 Creating the Main Script
We can now create the main script binding_site_quality1.py. The basic outline of the script
has the following outline:
1.
2.
3.
The module my_rstools3 is imported
Some variables (PDB code, binding site index, various cutoffs) are set
The quality checks from my_rstools3 are called
When reading through the script notice the use of the try/except idiom for catching invalid PDB codes
and binding site indexes.
Try running the script using the command:
python binding_site_quality1.py
in the directory where you downloaded both binding_site_quality1.py and my_rstools3.py. It should
give you the following output:
--------------------------------------------------------------------------PDB code
: 1mup
Binding site: BindingSite<1mup:CD_201>
Ligand
: Ligand<pdb:1mup:CD_201>
Resolution : 2.40
R-value
: 0.191
Reliscript User Guide
121
Checking b-factors...
Atom(C)<pdb:1mup:CHN--:'119':912> has large b_factor 41.60 (> 40.00)
Atom(O)<pdb:1mup:CHN--:'119':913> has large b_factor 47.07 (> 40.00)
Atom(N)<pdb:1mup:CHN--:'119':914> has large b_factor 60.00 (> 40.00)
Atom(C)<pdb:1mup:CHN--:'141':1091> has large b_factor 48.16 (> 40.00)
Atom(O)<pdb:1mup:CHN--:'144':1110> has large b_factor 58.05 (> 40.00)
Atom(C)<pdb:1mup:CHN--:'144':1113> has large b_factor 59.23 (> 40.00)
Atom(O)<pdb:1mup:CHN--:'144':1114> has large b_factor 60.00 (> 40.00)
Atom(O)<pdb:1mup:CHN--:'144':1115> has large b_factor 60.00 (> 40.00)
Atom(O)<pdb:1mup:CHN--:'145':1119> has large b_factor 40.14 (> 40.00)
Atom(O)<pdb:1mup:CHN--:'146':1129> has large b_factor 50.15 (> 40.00)
Checking occupancy... ok
Checking for missing atoms...
Residue<pdb:1mup:CHN--:'146'> missing 1 atoms
Checking symmetry...
There are 9 symmetry packed atoms in binding site
Checking protein side-chain bad clashes...
Atom(C)<pdb:1mup:CHN--:'108':822> bad contact with Atom(N)<pdb:1mup:CHN--:'119':914>: 3.07 <
3.40
Atom(C)<pdb:1mup:CHN--:'108':823> bad contact with Atom(N)<pdb:1mup:CHN--:'119':914>: 3.23 <
3.40
Atom(N)<pdb:1mup:CHN--:'108':824> bad contact with Atom(N)<pdb:1mup:CHN--:'119':914>: 2.32 <
3.20
Atom(S)<pdb:1mup:CHN--:'121':929> bad contact with Atom(C)<pdb:1mup:CHN--:'145':1121>: 3.67 <
3.70
Checking protein ligand bad clashes... ok
---------------------------------------------------------------------------
13.2.6 Adding optparse and raw_input Functionality
Obviously, one could edit the main script every time one wanted to determine the quality of a
different PDB structure. However, it would be handy to be able to read the PDB code of interest from
the command line. Furthermore it would be useful if one was provided with a list of binding sites, so
that the user could interactively select the binding site of interest. In terms of reading in arguments
and options from the command line, we will be making use of the built in module optparse. For
the selection of ligands we will be making use of the built in functionality raw_input. Have a look
at the code in binding_site_quality2.py.
The optparse module automatically sorts out command line help. To illustrate this, try running the
command:
python binding_site_quality2.py -h
Now try running the script using the command:
python binding_site_quality2.py 1mup
You will be prompted to select a binding site:
122
Reliscript User Guide
Make binding site selection...
[0] BindingSite<1mup:CD_201>
[1] BindingSite<1mup:CD_202>
[2] BindingSite<1mup:CD_203>
[3] BindingSite<1mup:CD_204>
[4] BindingSite<1mup:TZL_167>
Type 4 and enter. You should get the following output:
--------------------------------------------------------------------------PDB code
: 1mup
Binding site: BindingSite<1mup:TZL_167>
Ligand
: Ligand<pdb:1mup:TZL_167>
Resolution : 2.40
R-value
: 0.191
Checking b-factors...
Atom(O)<pdb:1mup:CHN--:'46':333> has large b_factor 56.67 (> 40.00)
Atom(C)<pdb:1mup:CHN--:'60':447> has large b_factor 46.58 (> 40.00)
Checking occupancy... ok
Checking for missing atoms... ok
Checking symmetry... ok
Checking protein side-chain bad clashes...
Atom(C)<pdb:1mup:CHN--:'60':447> bad contact with Atom(C)<pdb:1mup:CHN--:'73':545>:
Atom(C)<pdb:1mup:CHN--:'60':447> bad contact with Atom(S)<pdb:1mup:CHN--:'73':546>:
Atom(C)<pdb:1mup:CHN--:'73':547> bad contact with Atom(C)<pdb:1mup:CHN--:'88':653>:
Atom(C)<pdb:1mup:CHN--:'73':547> bad contact with Atom(C)<pdb:1mup:CHN--:'88':654>:
Atom(C)<pdb:1mup:CHN--:'73':547> bad contact with Atom(C)<pdb:1mup:CHN--:'88':656>:
Atom(C)<pdb:1mup:CHN--:'88':652> bad contact with Atom(N)<pdb:1mup:CHN--:'92':690>:
Checking protein ligand bad clashes... ok
---------------------------------------------------------------------------
3.26
3.42
3.34
2.85
3.16
3.17
<
<
<
<
<
<
3.40
3.70
3.40
3.40
3.40
3.40
13.2.7 Altering the Binding Site Selection Process
Reliscript offers a simple and flexible way of investigating protein-ligand interactions. The current
script binding_site_quality2.py requires the user to provide the PDB code of interest on
the command line and for the user to interactively select the binding site of interest. However,
because of the modular design of the code in this tutorial one could now easily create a separate script
for designing a large validation set such as in the studies referenced in the introduction (Nissink et al.,
Proteins, 49, 457-471, 2002; Hartshorn et al., J. Med. Chem., 50, 726-741, 2007; Verdonk et al., J.
Chem. Inf. Model., in press 2008). The first steps in such a process would probably be to exclude
protein-ligand interactions where the ligand was a metal, an ion or a cofactor, which could be easily
be achieved by creating a ligand set and filtering it.
>>>
>>>
>>>
>>>
import reliscript as rs
ligand_set = rs.set(’ligand’)
filter_by_mr = rs.numeric_search(’mol_wt’, min=300, component=’ligands’)
filter_by_mr(ligand_set)
Reliscript User Guide
123
Once happy with the ligand set one could investigate the quality of the binding sites associated with
each of the ligands in the set.
124
Reliscript User Guide