Download Automation of virtual screening process using the Pipeline Pilot

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Drug design wikipedia , lookup

Transcript
Automation of virtual
screening process using the
Pipeline Pilot
Fabián Villalta Romero
October 2014
Summary of the course
• Genomic to structure
• Homology Models
• Targets
• Docking
Drug Discovery
Nature Reviews Drug Discovery 7, 807-817
Possible Targets!!
The Human Genome Project
Number of identified
genes ≈ 32,000
Another:
Corynebacterium pseudotuberculosis I19!!!
Xyllela fastidiosa
genome
http://www.ebi.ac.uk/genomes/
http://www.ncbi.nlm.nih.gov/genome
Possible Targets!!
HUPO: The Human Proteome Organization
http://www.vox.com/2014/9/23/6832023/ebola-virus-global-health-panic
Possible Targets!!
B
A
Two dimensional electrophoresis of the
proteins from pooled B. asper venoms
from adults of the Caribbean (A) and the
Pacific (B) versants of Costa Rica.
Alape-Girón et al. Toxicon, 2009 54, 938–948.
Possible Targets!!
http://www.rcsb.org/
http://www.uniprot.org/proteomes/
Ligands structures
Data from ZINC
Exercise, 2 min!!!
http://zinc.docking.org/
Data from ChemBL
https://www.ebi.ac.uk/chembl/
Data from PubChem
https://pubchem.ncbi.nlm.nih.gov/search/index.html
Integrate Omics for drug discovery!!
>2,000,000 proteins
Ou-Yang, et al. Acta Pharmacologica Sinica 33, 1131–1140.
Ou-Yang, et al. Acta Pharmacologica Sinica 33, 1131–1140.
Some useful tools!!
Scientific workflow system for drug
discovery
Why?
• Automate tedious jobs that scientists
traditionally performed by hand for
each dataset
• Process large volumes of data faster
History
Workflow, as a concept, was defined in
the business domain in 1996 by the
Workflow Management Coalition as:
“The automation of a business process, in whole
or part, during which documents, information or
tasks are passed from one participant to another
for action, according to a set of procedural
rules.”
http://www.taverna.org.uk/introduction/why-use-workflows/
Scientific workflow system
Designed specifically to compose and
execute a series of computational or
data manipulation steps, or workflow, in
a scientific application
Scripts that call in
data, programs,
and other inputs
Produce outputs
that might include
visualizations and
analytical results
Examples of scientific workflow
systems
• Anduril
• ASKALON
• Apache
• BioBIKE
• Bioclipse
• Discovery
Net
• Ergatis
• Kepler
• Mobyle
• OnlineHPC
• OpenMOLE
• Orange
• Pegasus
• Tavaxy
• Galaxy
• KNIME
• Taverna
• PipeLine
Pilot
Galaxy
Not longer supporting the Windows platform!!
Knime
https://www.knime.org
Knime
Taverna
Taverna
• Taverna is :
– A workflow language based on a dataflow model.
– A graphical editing environment for that language.
– An invocation system to run instances of that language
on data supplied by a user of the system.
• When you download it you get all this rolled into a single
piece of desktop software
• The enactor can be run independently of the GUI
• Java based, runs on Windows, Mac OS, Linux, Solaris ….
• It doesn't necessarily run "on a grid".
• Can be used to access resources, either on a grid, or
anywhere else.
Pipeline Pilot
Pipeline Pilot is a graphic authoring application that:
• optimizes the research innovation cycle,
• increases operational efficiency and
• reduces costs for both research and information
technology (IT).
Pipeline Pilot automates the scientific analysis of
data, enabling users across the enterprise to rapidly
explore, visualize and report research results.
scientific
https://www.youtube.com/watch?v=nZ2YMNv-uTk
https://community.accelrys.com
Pipeline Pilot
What is a Pipeline?
• A series of components connected
through pipes through which data flows.
• Each component acts on the data and
passes it on to subsequent components.
Pipeline Pilot
What is a Protocol?
• A protocol consists of one or multiple pipelines
that are run sequentially.
What is a Component?
• A component is the building block used to create
workflows
• Each component performs a task like reading,
writing or manipulating data
• Components can have one input and up to two
output ports
• Highlighting a component displays its parameter
panel which control its behavior
Data Flow in a Pipeline
Data Reading
• Generic file readers
• File Reader
• Database readers
• File readers support: zip file
Reader Components
• The Keep Properties parameter of most
reader allows you to read in only a list
of specified properties
Data Writing
• Generic file writer
• File writer are available
for molecular and
sequence formats:
– SD,
– MOL2,
– SMILE,
– PDB,
– FASTA.
Data Viewing
• Viewers run on the client and
third party applications need
to be installed on the client
machine:
Excel,
Internet
Explorer
• Charting viewers are available
using
Excel
and/or
the
reporting Collection
Data Manipulation
•
Property value function components available for manipulating property
values
•
Property list function components available for manipuling the property list.
•
Molecular manipulators allow the user to change the molecular object
•
The Custom Manipulator (PilotScript) component uses scripting to create
powerful manipulations
Property Calculator
• A number of molecular property
calculator are included, including:
Alog, pKa, surface area, etc.
Property Manipulation
Components
• Copy Property,
• Rename Property,
• Keep Property,
• Remove Property
Exercise!!
Data Handling
• Connections of components
• Drag and Drop components
• Insertion of component
• Reusing Component
• Sequential Execution
• Component Disabling
Exercise!!
Ligand Preparation
• ADMET
The Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) collection
provides components to calculate a number of properties important for drug
discovery and development.
• The ADMET components are organized into the following main groups:
• ■ Calculators based on published models
• ■ Learned extensible models
• ■ Metabolism components
• ■ TOPKAT models
•
•
•
•
The calculators based on published models include:
■ Human intestinal absorption (HIA)
■ Aqueous solubility
■ Blood brain barrier penetration (BBB)
Example!! Calculate ADMET Properties
Prepare Ligand
Filters ligands using Lipinski and Veber
rules
Selects drug-like ligands according to
rules formulated by Lipinski (Rule of Five)
and Veber. The ligands passing the filters
have a higher probability of good oral
bioavailability.
Look Filter by Lipinski and Veber Rules!!
Ligand Preparation
Creates tautomers, isomers, protomers and 3D coordinates
Prepare ligands for use in other applications, in particular those that require a 3D
coordinates and biological ionization and tautomerization states.
When studying receptor-ligand interactions and other areas, it is important to correctly
prepare the ligands. Different protonation states, isomers and tautomers typically have
different 3D geometries and binding characteristics. If the known binding configuration is
not known, one approach is to enumerate a number of likely configurations before running
docking protocols. Ligand Preparation will help you with that task.
The following options are available:
• Standardize charges for common groups
• Retain only the largest fragment
• Add Hydrogens
• Represent in Kekule form
• Enumerate ionization states
• Ionize functional groups
• Generate tautomers and isomers
• Remove duplicates
• Fix bad valencies
• Calculate 3D coordinates using Catalyst
Look Prepare Ligands Protocol!!
Prepare protein
Prepares a protein structure
Cleans up common problems in the input protein structure in
preparation for further processing by other protocols.
The following steps are performed:
• Standardize atom names, insert missing atoms in residues
and remove alternate conformations. May also remove
water and ligand molecules depending on the setting of
Advanced|Keep Waters and Advanced|Keep Ligands.
• Insert missing loop regions based on either SEQRES data or
user specified loop definitions (optional).
• Optimize short and medium size loop regions with the
LOOPER algorithm (optional).
• Minimize the remaining loop regions (optional).
• Calculate the pK and protonate the structure (optional).
Look Prepare Protein!!
Virtual screening!!
Design Workflow: genome to
possible targets!
http://www.ncbi.nlm.nih.gov/genome/
?term=ebolavirus
Ebola_vírus
Wikipedia
Protein coding regions
Homology Model
Prepare Protein, Ligand
preparation, Docking
Compare Protocols
• https://www.youtube.com/watch?v=
W0ILTnVTZvI
THANKS!! OBRIGADO!!
GRACIAS!!