Download Cheminformatics and Pharmacophore Modeling

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Application Guide
Cheminformatics and
Pharmacophore Modeling,
Together at Last
SciTegic Pipeline Pilot—Bridging Accord
Database Explorer and Discovery Studio
Carl Colburn
Shikha Varma-O’Brien
Introduction
The integration of Accelrys’ Accord™ Database Explorer 3.0 (ADE 3.0) and Discovery Studio® 1.7 with the SciTegic
Pipeline Pilot™ data analysis and mining platform now makes it easy for you to leverage screening data within your
molecular modeling analyses. Using a published HTS data set1 as an example, this document reviews how you can:
•
Consolidate compound data within a centralized location using SciTegic Pipeline Pilot
•
Create and configure a project- or enterprise-level database from that consolidated data using
•
Query the resulting ADE 3.0 database to identify modeling candidates based on defined
•
Analyze the queried data within the powerful Discovery Studio modeling environment
Accord Database Explorer 3.0
criteria using Pipeline Pilot
SciTegic Pipeline Pilot—
Bridging Cheminformatics and Molecular Modeling
The SciTegic Pipeline Pilot server platform streamlines the integration and analysis of vast quantities
of data, and can retrieve or join data from independent databases, files, and client applications. It can
directly read chemistry, sequence, text, and numeric data from all popular formats and analyze data from
multiple sources. As shown in Figure 1, both Accord cheminformatics tools and the Discovery Studio
molecular modeling environment can share a common underlying SciTegic Pipeline Pilot server.
Additionally, within the graphical client interface to Pipeline Pilot, you can compose data processing
networks (known as protocols) using hundreds of different configurable components for operations
such as data retrieval, manipulation, computational filtering, and display. There are several Pipeline Pilot
Component Collections available that include diverse components for creating such data processing
protocols. A few of these components are included in the example described in this document.
Figure 1. A common Pipeline Pilot server shared by Accord informatics and Discovery
Studio modeling tools bridges cheminformatics with molecular modeling.
Step 1: Consolidating and Organizing Data Using Pipeline Pilot
Typically compound data is not readily available from a centralized location, but rather is contained in
various files scattered throughout the organization, sometimes even in different formats. With Pipeline Pilot,
you have an efficient way to consolidate these data, store them in a secure, searchable, indexed database,
and retrieve specific subsets of compounds for further study based on the goals of the project.
To illustrate these capabilities, we took both an SD file and text files from our example HTS screening
data and used Pipeline Pilot to create a protocol (shown in Figure 2) that merged the datasets based
on identical molecules, standardized the chemical structures, and created a new, merged dataset.
Optionally, components from the Pipeline Pilot Chemistry Collection can also be included in the protocol
to filter based on properties such as Lipinski’s filters, reactive substructures, descriptors, etc.
Figure 2. An example Pipeline Pilot protocol to merge disparate datasets
Step 2: Creating a Local Database using Accord Database Explorer 3.0
Accord Database Explorer 3.0 is a forms-based database client that provides powerful querying and browsing tools
for extracting maximum value from local and server-hosted data sources. ADE 3.0 features a Database Set-up Wizard,
which lets you easily create an Access format database from an SD file. This creates a stable, secure environment to
store your data and allows the chemistry compounds to be indexed for faster searching. You can create various projectlevel databases in this way, and you can customize the column names and set-up of the tables during this process.
Figure 3 shows one step of the ADE 3.0 Set-up Wizard that gives a view onto the table. This allows you
to create local, project-level databases with common column names so that they can be more easily
queried together in SQL statements within Pipeline Pilot, as described in the following section.
Figure 3. Editable table structure in ADE 3.0 Set-up Wizard
Step 3: Querying the Database to Identify Modeling
Candidates Using Pipeline Pilot
Once the database is created with ADE 3.0, it is now available as a data source from within Pipeline Pilot, where
you can search the data to select qualified compounds for further modeling and analysis with Discovery Studio.
As described in the following paragraphs, this requires three steps: connect to the new database; construct the SQL
Select statement to perform a specific query; and create a protocol to retrieve desired compounds for modeling.
Connecting to a Database Using ODBC
To connect to a database using ODBC (Open DataBase Connectivity), you must create a domain system name on your
machine. This is done in the Control Panel|Administrative Tools|Data Sources menu. Since we are using an Access
database in this example, select the Microsoft Access Driver. You may use a user name and password if desired.
Creating a Query Protocol
Next, as shown in Figure 4, create a protocol in Pipeline Pilot using the ODBC Select component. Since the
database in this example is in Accord format, we use the ODBC component to access the data; but, we need
to specify the format by including “…as accord_mol…” in the SQL select statement. We then include “Molecule
from Accord” and “Minimize Molecule” components in our protocol to convert the data format and optimize
the chemistry for Discovery Studio. Enter the DSN name, user name and password in the properties area. Then,
specify the columns (parameters) from the compiled ADE 3.0 database that you want included in your dataset.
Figure 4. Pipeline Pilot protocol with ODBC parameters
Modifying the SQL
You may now modify the SQL as needed, as illustrated in Figure 5. In this example, records without an actual IC50 data point were
eliminated because we later use the IC50 data to study the structure-activity relationships of the compounds in Discovery Studio.
Figure 5. PilotScript SQL statement builder
Step 4: Modeling the Selected Compounds in Discovery Studio:
Conformational Analysis, Pharmacophore Analysis, Etc.
Depending on the objective of the project, various modeling approaches can be pursued once some experimental
data is available. Discovery Studio provides both structure- and ligand-based modeling approaches.
As shown in Figure 6, the customized protocols that you’ve developed in the Pipeline Pilot client for querying an
ADE 3.0 database can be published within a user protocols folder in the Discovery Studio GUI. This means that you
can select and execute your desired Pipeline Pilot protocol from directly within the Discovery Studio interface. In
this example, we’ve created a protocol in which compounds’ IC50 data points are pulled from the ADE 3.0 database
and are used to understand structure-activity relationships (SAR), which can then help guide future synthesis.
Figure 6. Customized workflow: Pull data from ADE 3.0 into Discovery Studio
Additionally, we can generate pharmacophore models based on the common features of the compounds
in the ADE 3.0 database, which can then be used in Discovery Studio for scaffold hopping, lead finding
by 3-D database mining, understanding binding modes, etc. Figure 7 shows a simple common-feature
pharmacophore model of four active CDK2 inhibitors. The model represents the framework of features
(acceptors, donors, aromatic rings, etc.) shared among these compounds and also provides the feature-
based alignments of these molecules. Such models can be used to prioritize compounds for screening.
Figure 7. Pharmacophore model built in Discovery Studio based on the common features
(acceptors, donors, aromatic rings, etc) of four active CDK2 inhibitors included in an ADE 3.0 database
Choose (or Invent!) Your Workflow
Once in the Discovery Studio environment, modeling can be performed in any desired workflow
combining both structure-based and/or ligand-based approaches, as shown in Figure 8. The Pipeline
Pilot server allows you to customize any protocol and add external algorithms as well.
Figure 8: Examples of Discovery Studio modeling capabilities
References:
1. Bradley, E.K., Miller J.L., Saiah, E. and Grootenhuis, P.D.J. J. Med. Chem., 2003, 46, 4360-4364.
2. SciTegic website: http://www.scitegic.com/community/downloads/downloads.html