Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Automation of virtual screening process using the Pipeline Pilot Fabián Villalta Romero October 2014 Summary of the course • Genomic to structure • Homology Models • Targets • Docking Drug Discovery Nature Reviews Drug Discovery 7, 807-817 Possible Targets!! The Human Genome Project Number of identified genes ≈ 32,000 Another: Corynebacterium pseudotuberculosis I19!!! Xyllela fastidiosa genome http://www.ebi.ac.uk/genomes/ http://www.ncbi.nlm.nih.gov/genome Possible Targets!! HUPO: The Human Proteome Organization http://www.vox.com/2014/9/23/6832023/ebola-virus-global-health-panic Possible Targets!! B A Two dimensional electrophoresis of the proteins from pooled B. asper venoms from adults of the Caribbean (A) and the Pacific (B) versants of Costa Rica. Alape-Girón et al. Toxicon, 2009 54, 938–948. Possible Targets!! http://www.rcsb.org/ http://www.uniprot.org/proteomes/ Ligands structures Data from ZINC Exercise, 2 min!!! http://zinc.docking.org/ Data from ChemBL https://www.ebi.ac.uk/chembl/ Data from PubChem https://pubchem.ncbi.nlm.nih.gov/search/index.html Integrate Omics for drug discovery!! >2,000,000 proteins Ou-Yang, et al. Acta Pharmacologica Sinica 33, 1131–1140. Ou-Yang, et al. Acta Pharmacologica Sinica 33, 1131–1140. Some useful tools!! Scientific workflow system for drug discovery Why? • Automate tedious jobs that scientists traditionally performed by hand for each dataset • Process large volumes of data faster History Workflow, as a concept, was defined in the business domain in 1996 by the Workflow Management Coalition as: “The automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules.” http://www.taverna.org.uk/introduction/why-use-workflows/ Scientific workflow system Designed specifically to compose and execute a series of computational or data manipulation steps, or workflow, in a scientific application Scripts that call in data, programs, and other inputs Produce outputs that might include visualizations and analytical results Examples of scientific workflow systems • Anduril • ASKALON • Apache • BioBIKE • Bioclipse • Discovery Net • Ergatis • Kepler • Mobyle • OnlineHPC • OpenMOLE • Orange • Pegasus • Tavaxy • Galaxy • KNIME • Taverna • PipeLine Pilot Galaxy Not longer supporting the Windows platform!! Knime https://www.knime.org Knime Taverna Taverna • Taverna is : – A workflow language based on a dataflow model. – A graphical editing environment for that language. – An invocation system to run instances of that language on data supplied by a user of the system. • When you download it you get all this rolled into a single piece of desktop software • The enactor can be run independently of the GUI • Java based, runs on Windows, Mac OS, Linux, Solaris …. • It doesn't necessarily run "on a grid". • Can be used to access resources, either on a grid, or anywhere else. Pipeline Pilot Pipeline Pilot is a graphic authoring application that: • optimizes the research innovation cycle, • increases operational efficiency and • reduces costs for both research and information technology (IT). Pipeline Pilot automates the scientific analysis of data, enabling users across the enterprise to rapidly explore, visualize and report research results. scientific https://www.youtube.com/watch?v=nZ2YMNv-uTk https://community.accelrys.com Pipeline Pilot What is a Pipeline? • A series of components connected through pipes through which data flows. • Each component acts on the data and passes it on to subsequent components. Pipeline Pilot What is a Protocol? • A protocol consists of one or multiple pipelines that are run sequentially. What is a Component? • A component is the building block used to create workflows • Each component performs a task like reading, writing or manipulating data • Components can have one input and up to two output ports • Highlighting a component displays its parameter panel which control its behavior Data Flow in a Pipeline Data Reading • Generic file readers • File Reader • Database readers • File readers support: zip file Reader Components • The Keep Properties parameter of most reader allows you to read in only a list of specified properties Data Writing • Generic file writer • File writer are available for molecular and sequence formats: – SD, – MOL2, – SMILE, – PDB, – FASTA. Data Viewing • Viewers run on the client and third party applications need to be installed on the client machine: Excel, Internet Explorer • Charting viewers are available using Excel and/or the reporting Collection Data Manipulation • Property value function components available for manipulating property values • Property list function components available for manipuling the property list. • Molecular manipulators allow the user to change the molecular object • The Custom Manipulator (PilotScript) component uses scripting to create powerful manipulations Property Calculator • A number of molecular property calculator are included, including: Alog, pKa, surface area, etc. Property Manipulation Components • Copy Property, • Rename Property, • Keep Property, • Remove Property Exercise!! Data Handling • Connections of components • Drag and Drop components • Insertion of component • Reusing Component • Sequential Execution • Component Disabling Exercise!! Ligand Preparation • ADMET The Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) collection provides components to calculate a number of properties important for drug discovery and development. • The ADMET components are organized into the following main groups: • ■ Calculators based on published models • ■ Learned extensible models • ■ Metabolism components • ■ TOPKAT models • • • • The calculators based on published models include: ■ Human intestinal absorption (HIA) ■ Aqueous solubility ■ Blood brain barrier penetration (BBB) Example!! Calculate ADMET Properties Prepare Ligand Filters ligands using Lipinski and Veber rules Selects drug-like ligands according to rules formulated by Lipinski (Rule of Five) and Veber. The ligands passing the filters have a higher probability of good oral bioavailability. Look Filter by Lipinski and Veber Rules!! Ligand Preparation Creates tautomers, isomers, protomers and 3D coordinates Prepare ligands for use in other applications, in particular those that require a 3D coordinates and biological ionization and tautomerization states. When studying receptor-ligand interactions and other areas, it is important to correctly prepare the ligands. Different protonation states, isomers and tautomers typically have different 3D geometries and binding characteristics. If the known binding configuration is not known, one approach is to enumerate a number of likely configurations before running docking protocols. Ligand Preparation will help you with that task. The following options are available: • Standardize charges for common groups • Retain only the largest fragment • Add Hydrogens • Represent in Kekule form • Enumerate ionization states • Ionize functional groups • Generate tautomers and isomers • Remove duplicates • Fix bad valencies • Calculate 3D coordinates using Catalyst Look Prepare Ligands Protocol!! Prepare protein Prepares a protein structure Cleans up common problems in the input protein structure in preparation for further processing by other protocols. The following steps are performed: • Standardize atom names, insert missing atoms in residues and remove alternate conformations. May also remove water and ligand molecules depending on the setting of Advanced|Keep Waters and Advanced|Keep Ligands. • Insert missing loop regions based on either SEQRES data or user specified loop definitions (optional). • Optimize short and medium size loop regions with the LOOPER algorithm (optional). • Minimize the remaining loop regions (optional). • Calculate the pK and protonate the structure (optional). Look Prepare Protein!! Virtual screening!! Design Workflow: genome to possible targets! http://www.ncbi.nlm.nih.gov/genome/ ?term=ebolavirus Ebola_vírus Wikipedia Protein coding regions Homology Model Prepare Protein, Ligand preparation, Docking Compare Protocols • https://www.youtube.com/watch?v= W0ILTnVTZvI THANKS!! OBRIGADO!! GRACIAS!!