Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
What’s new in JChem back-end and Markush storage, search and enumeration Szabolcs Csepregi Solutions for Cheminformatics Contents • ChemAxon chemical database tools • Main features of JChem Base, Cartridge • Example interfaces: JSP, ASP, AJAX examples • Integration with other CXN products • Markush structure storage, search and enumeration • Recent developments, plans Chemical database products JChem Base – A library for adding chemical structures into relational database systems. Available in Java, JSP and .NET – Open-source web application example is available. JChem Cartridge for Oracle – Extends Oracle SQL with chemical operators and index. – SQL interface for ChemAxon functionality Instant JChem – An all-in-one desktop chemical database application. JChem Web Services – SOAP interface to JChem Base JC4XL – Excel integration (coming) 3 Compatibility and integration Supported chemical file formats: • • • • • SMILES MDL MOL/RXN/SDF/RDF (v2000 and v3000) CML, MRV IUPAC and traditional names InChI, mol2, PDB, etc. Database engines: • Oracle, MySQL, MS SQL Server, MS Access, PostgreSQL, IBM DB2, Derby, etc. All operating systems through: • • • Java API (JChem Base) .NET API (JChem Base + IKVM) – for Windows SQL (Cartridge) 4 Structure searching: features • Substructure, Similarity, Full, Full fragment, etc. search types • Wide range of query atoms • Query properties • R-group queries • Full SMARTS support • Coordination compounds • Link nodes • Pseudo atoms, Lone pairs • Relative stereo • Reaction search features • Polymers • Position variation • Hit coloring ... www.chemaxon.com/conf/Structural_Search.ppt 5 Structure searching: options Some selected structure search options: – – – – – – – – – – Chemical Terms filter constraint Tautomer search Stereo on/off Ignore charge/isotope/radical/valence/polymers Vague bond matching modes: „or aromatic”; ignore bond types Inverse hit list Maximum search time / number of hits SQL SELECT statement for pre-filtering Ordering of results etc. 6 Structure search: performance Compound registration: Substructure search in PubChem (19.5 million compounds): JChem Base 5.2.0, Intel Quad Q6600 2.4GHz, 8GB RAM; Oracle 10.2.0.3 Number of compounds Elapsed time Duplicates not checked Duplicates checked 10,000 21 s 26 s 100,000 2 min 2 min 36 s 200,000 3 min 45 s 5 min 5 s Query Number of hits Search time 2 0.81 s 93 0.79 s 5,855 1.457 s 142,950 11.076 s 7 Table types Control allowed chemical structures and available operations • Molecule • Reaction • Markush • Query • Any structure 8 Example web applications Open source JSP, ASP examples – Marvin applets are used for query drawing and structure visualization AJAX example – Back-end is JChem Web Services – No Java is needed for browsing Demo 9 Integration Integration with other ChemAxon tools: – Custom, uniform chemical representation. (Standardizer – see separate presentation today.) – Automatically calculated properties by Chemical Terms Calculated columns (Calculator plugins) – Additional similarity calculations (Screen - JChem Base only) – Tautomer handling: • Tautomer search • Tautomer duplicate filter table/index option • Custom tautomer transforms or canonical tautomer using Standardizer – Query drawing and structure visualization (Marvin) Provides the most consistent interface and back-end. 10 Integration Additional Cartridge functionality – – – – – – JChem index (for non-JChem tables) Communication with Oracle optimizer Reaction based enumeration (Reactor) Format conversions – image generation also Markush enumeration (Calculator plugins) Property predictions through Chemical Terms (Calculator plugins) 11 Registration system • New component for registration system is under development (API only) • Main features: – Customizable business logic • Multilevel duplication control • Customizable corporate registration ID • Handling of salts, batches, lots, samples, and mixtures – Identification, split and registration of salt and solvent structures Storage of input structures in original format – Mock registration (dry run) – Pre-registration through a transitory area – Basic, customizable implementation examples • Separate examples for chemists and registrars • Web and Instant JChem interfaces will follow later 12 Handling of Markush structures Markush structures • Combinatorial Markush structure registration and search features handled in search and enumeration – – – – – – R-groups (nesting to any depth) Atom lists, bond lists Position variation bond Link nodes Repeating units Homology groups (aryl, alkyl, etc.) • Built-in • User-defined • Compatible Markush enumeration plugin Markush Enumeration • Markush enumeration plugin – – – – Full enumeration Selected parts only Random enumeration Calculate library size: exact size of huge Markush libraries arbitrary precision or Magnitude – Scaffold alignment and coloring – Markush code – Optional example homology group enumeration Markush storage & search • Available in JChem Base and Instant JChem • No enumeration involved – can handle very complex Markush structures (tested up to 1040, but no explicit limits were built in.) • Substructure and Full structure search • Basic query features supported • Substructure hit visualization: „Markush structure reduction” Markush demo What’s new What’s new: JChem Base 5.1 – Position variation in queries – New fast & reliable tautomer duplicate search 5.2 – .NET API – Polymer storage and search – New query options and features including searching of attached data, group matching of undefined R-atoms, repeating units. – Improved substructure search performance – JChem Web Services – New metrics for similarity search (Tversky, etc.) (5.2.2) What’s new: JChem Base Polymer support details • Polymer brackets and properties(type, connectivity, etc.) considered during search and registration • Attached data search (optional) – attached to atoms/bonds/brackets • Source- and structure-based representation equivalence is checked (but can be switched off) – Addition to a double bond. E.g. polystyrene. – Polymerization through elimination of water or HCl. E.g. polyester, polyamide. What’s new: JChem Base Polymer support details (cont.) • Ladder type polymers • Phase-shifting (for ht SRU) (can be switched off) • End group matching: – * atoms: unspecified end groups – Search option to switch on/off end group matching • Copolymer types: co, alt, rnd, blk, grf, xl, mer, mod • Polymer mixtures • New search options What’s new: Cartridge-specific 5.1 – – – – Tautomer duplicate filtering index option Alter index option Improved import speed (5.1.3) Improved upgrade: no need to remove/recreate indices (5.1.4) 5.2 – Interactive installer – Increased substructure search performance (5.2.2) – Tversky similarity search (5.2.2) What’s new: Markush • New Features – Homology groups • 19 built-in groups • Customizable: – Examples (for built-in groups, enumeration only), – Full user-defined homology groups defined by R-group definition • Marvin templates for easier sketching – Import reagent files as R-groups – Position variation and Repeating units Plans Plans: JChem Base & Cartridge JChem Base • Further speed improvements (SSS, similarity) • New vague bond level options • R-group decomposition integration • Improved support for Screen molecular descriptors Cartridge • Screen molecular descriptors (BCUT, pharmacophore similarity, chemical hashed fp, etc) and metrics (Euclidean, Dice, etc.) for similarity search • User-defined descriptor fingerprints • Markush tables and search • JChem Server, JChem cluster Plans: Markush – .VMN import (format used by Merged Markush Service & Derwent World Patent Index) – Multiple graphical attachment points of R-groups – Homology variation queries – Overlap analysis of Markush structures – Homology group properties (# of atoms, branching points, # of heteroatoms, etc.) – Conditions for Markush variables Summary • JChem Base and Cartridge are comprehensive and efficient • Markush structure storage, search and enumeration now reaching patent features coverage • Continuous development, improvements in the pipeline Find out more • Product descriptions & links www.chemaxon.com/products.html • Forum www.chemaxon.com/forum • Presentations and posters www.chemaxon.com/conf • Download www.chemaxon.com/download.html