Download - ChemAxon

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Oracle Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
BeeHive a datamining tool
at
Biovitrum and iNovacia
Mats Dahlberg
Research Informatics
iNovacia AB, Sweden
ChemAxon UGM, Budapest June 7 2006
Research Informatics’ Philosophy
• All data in Oracle
– Safe, pharma industry standard (e.g. many chemical cartridges,
ChemAxon, MDL, Accelrys, ...)
– ”Data is our asset. Programs come and go.”
• Integration through database layer
– ...but hidden to the users. Multiple front-ends allowed
• Applications rapidly adapted to users needs
– Close connection developers - users
– Workflow support requires full control over the code
• Unorthodox solutions are allowed
– Sometimes quick and dirty development
– Sometimes unstable code (but usually fixed quickly...)
– Sometimes non-standard technical platform (e.g. Bee language)
BeeHive
• Function
– Main repository for ALL research data (almost)
– Used by all project teams
– Technical platform for various modules
• Features
– Advanced on-the-fly join of DB table
– Versatile handling of lists (compounds, batches, projects ...) and
Queries
– Data grouping (”One-line-per-compound”)
– Fully customisable through meta-data, easy to add new branches
(CBT, ELN stats etc)
– Structure searching through ChemAxon Oracle cartridge
– Built on Bee language from MolSoft LLC, San Diego
• Status
– Moved from MDL’s cartridge 2006
– Business critical. Appr 250 users throughout R&D
The heart – just a SQL generator…
• Defines column types and
cost for all joinable columns
• All possible joins are precalculated, travelling
salesman problem (more
then 300 tables)
Meta data structure
• Define entities and clean up the dictionaries
– Compound numbers, protein targets, batches, plasmids ...
– One source for every entity  possible to validate numbers
 no misspellings  improved data quality
• This is the core of integration - not a particular client or
system
• None of this comes out-of-the-box!
“Biology”
Genes
Genes
Prog 1
BioPhysProp
BioPhysProp
Targets
Targets
ActivityBase
ActivityBase
Ba
ck
bo
ne
pBV
pBV
Batch
Batch
CIMS
CIMS
bCOOL
bCOOL
“Chemistry”
Prog 2
Decisions
Decisions
Cross
database
client
Assays
Assays
ChemSpec
ChemSpec
BVT
BVTcpnd
cpnd
Example from Biovitrum
BeeHive Overview
Query builder with structural searching
Navigate through all tables
Activity, solubility, chemist etc
Query builder
•All unique values in drop-down lists
•No hard-coded values
•Easy to spot errors
Extraction of data for SAR analysis
•One compound per line
•Average IC50 and SD values
•Hill number from ActivityBase
•Structure pop-up window
Systems and applications:
BeeHive Modules That Uses JChem
• CIMS
– Chemical Inventory Management System
– Keeps track of all chemicals (bottle history, location, risk
phrases etc)
– Replaced previous MDL system
– Fully barcoded (bottles, shelves, people...)
– Has improved compliance, reagent availability and speed
of inventory work
• Reagent Search
– ACX database of chemical catalogues from CambridgeSoft
– Cross-linked to CIMS
– ”Give me all amines under 250 Dal and show in-house on
top of the list”
Reagent searching
Systems and applications:
BeeHive Modules /cont’d/
• ChemSpec
–
–
–
–
–
Registration of all new compounds
Structure based logic for new compounds and batches
BVT (iNo) number assignment
Connection point for analytical data and requests
Used by all medicinal and analytical chemists
What is next on the list?
• JChem Calculated properties on all molecule databases
– pKa, logP, logD, ...
• Generation of diverse screening sets on the fly (BCUT?)
• ...
Summary - informatics
•
•
•
•
•
•
Data sharing is crucial
Excel is not enough!
No database  no modelling
Each organisation must define their meta data
You need a database administrator
Define the data structure first - applications can be
improved gradually