Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PIMS data management and harvesting General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you? Information Management System ■ Information Management System (IMS) is a joint database and information management system ■ A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data ■ Information management is the handling of knowledge acquired by many disparate sources in a way that optimizes access by all who have a share in that knowledge Scientific goals ■ Recording laboratory information ■ A lot of data keeping ■ 10,000s of experiments ■ 1,000,000s of samples ■ Data interchange and interoperation ■ Collaboration in protein production ■ Share data between stages and sites ■ Data transfer to beamline or NMR ops ■ Data mining and reporting ■ Analysis ■ Negative results can be mined to improve methods ■ Scientific publications ■ Data deposition PIMS ■ Protein Information Management System ■ Started in January 2005 ■ 5 years UK project, funded by the Biotechnology and Biological Sciences Research Council (BBSRC) ■ Based on the Protein Production Data Model paper ■ Proteins. 2005 Feb 1;58(2):278-84. “Design of a data model for developing laboratory information management and analysis systems for protein production.” Target selection Target optimisation Bioinformatics Scope of PIMS import Expression Purification & Concentration Crystallisation Microcrystals Molecular Biology export Data collection Phasing Model building Refinement Crystallography Cloning Stakeholders ■ BBSRC SPoRT funding ■ Scottish Structural Proteomics Facility (SSPF) ■ Universities of Dundee, St. Andrews, Glasgow and Warwick. ■ Membrane Protein Structure Initiative (MPSI) ■ Universities of Glasgow, Leeds, Oxford, Sheffield, Imperial College, Birkbeck College, UMIST and CCLRC Daresbury. ■ Protein Information Management System (PIMS) BBSRC funding PIMS SSPF MPSI ■ ■ ■ ■ ■ ■ ■ ■ CCP4, Diamond Oxford Protein Production Facility IBBMC, University Paris Sud European Bioinformatics Institute York Structural Biology Laboratory Daresbury Laboratory Other UK protein scientists Other protein scientists worldwide Collaborations ■ Seamless data transfer and a consistent UI ... ■ ... from target to structure deposition ■ ... so far as possible ■ Bioinformatics: SSPF pipeline, EBI workflow ■ Crystallization: NKI, EMBL Hamburg & Grenoble (BIOXHIT) ■ Data transfer: e-HTPX ■ Data collection: DNA, X-track ■ Structure solution: CCP4, CCPN ■ Instruments: Kendro, Csols General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you? Design ■ The data model ■ focuses on what data should be stored ■ is used to design the entities (classes or tables) that we are dealing with, their various attributes, and their relationships ■ The goal of the data model is to make sure that the all data objects required are completely and accurately represented Reliability ■ ■ ■ ■ Loss of data is inexcusable Must be able to correct wrong data Must keep audit trails Must allow future changes ■ All made feasible by ■ Data model ■ Database ■ Software engineering standards Ancestry ■ HalX: an open-source LIMS (Laboratory Information Management System) for small- to largescale laboratories. ■ OPPF based on Nautilus ■ MOLE: a data management application based on a protein production data model. ■ Acta Crystallogr D Biol Crystallogr. 2005 Jun;61(Pt 6):671-8. ■ Prilusky J, Oueillet E, Ulryck N, Pajon A, Bernauer J, Krimm I, Quevillon-Cheruel S, Leulliot N, Graille M, Liger D, Tresaugues L, Sussman JL, Janin J, van Tilbeurgh H, Poupon A. ■ Proteins. 2005 Feb 1;58(2):285-9. ■ Morris C, Wood P, Griffiths SL, Wilson KS, Ashton AW. PIMS ■ The aim is to provide a Laboratory Information Management System (LIMS) ■ for Laboratories that produce proteins from target genes ■ can be incorporated into commercial software in the area of biotech and protein production ■ Improve the quality of the experimental data deposited into PDB ■ by providing a software for lab scientists to harvest their daily experimental data from protein production to structure ■ My roles ■ Data Model ■ Database / Persistence layer / Java API ■ Java Applet development General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you? Why is Data Modelling Important? ■ A Data Model is a plan for building a database ■ detailed enough to be used to create the physical structure ■ simple enough to communicate to the end user the data structure ■ The Unified Modelling Language (UML) Data Model ■ Related to protein production & crystallisation ■ Suitable for large & small facilities ■ Required to reproduce the samples & experiments involved ■ Used for tracking samples, experiments & results ■ Developed to help software developers to collect, store and exchange information through the provision of a common platform Area covered ■ Protein production work is generally the investigation of a particular protein, the Target ■ The work often aims to produce a derivative of the Target, such as a single domain or complexes target protein production crystallisation NMR tube X-Ray NMR phasing structure The Core Data Model Change Control Board ■ ■ ■ ■ The data model is a work in progress The science is developing too Local protocols, which are novel and confidential Not easy work ■ Thanks to… ■ ■ ■ ■ ■ ■ Geoff Barton (Dundee) Steve Prince (Manchester) Anne Poupon (IBBMC) Jon Diprose (OPPF) Alun Ashton (Diamond) Rasmus Fogh (CCPN) Generation machinery ■ Implemented in UML (Object Domain) ■ Developed within a framework provided by the CCPN project ■ Information stored in the UML Data Model is used to generate automatically ■ SQL schema, ■ Java Application Program Interfaces (APIs) and ■ Documentation UML Data Model framework XML schema Python API SQL schema Doc www.ccpn.ac.uk Java API Architecture ■ The API provides methods to access the underlying DB to store and retrieve data ■ This allows applications to manipulate data without a detailed knowledge of the way in which the data is stored ■ Various different applications make use of the API ■ LIMS ■ Any High Throughput applications (non-GUI) ■ They are able to exchange data easily storage SQL schema DB API Persistence layer Java API Tools: GUI, standalone applications,… From data model to application ■ Data Model ■ Use cases ■ Scientific logic into requirements ■ Specifications ■ security, performance, usability, etc ■ ■ ■ ■ Java API Test data UI Design Application Modular Construction ■ http://www.pims-lims.org/project/use-case-suite.html Training & Support Workflow Reporting Scheduling Instrument Management System Administration Data Capture Inventory Management Setup & Configuration Visualisation Data Mining Mobile Data Collection Sample Management Access Rights Management Bioinformatics Project Management Reference Data Reference data ■ Supplier details ■ Protocols ■ documenting set of editable default protocols ■ user interface design with Ed Daniel ■ Reagents ■ protocol-related reference samples ■ chemical hazard information ■ e.g. R and S-phrases ■ documenting lab chemicals as ‘MolComponents’ ■ includes synonyms, formula, CAS-number and mass ■ naming system under discussion with NKI ■ ~400 identified, ~180 based on crystallisation screens Instrument management ■ Analytical Data: A Tower of Babel ■ Integration .5 NMR 12 11 10 9 8 7 6 5 4 3 2 1 0 Parts Per Million 1 LC 2.834 1.244 .863 1.927 .389 ■ CSols 1.5 2 Minutes 2.5 3 3.5 MS 0 20 40 60 80 100 120 140 160 180 200 Mass (m/z) IR 4000 3500 3000 2500 2000 1500 1000 Wavenumber (cm-1) ■ produces a widely used Instrument Integration Package ■ if the PIMS I/O is implemented in a reasonable timescale CSols may develop a PIMS Driver ■ Kendro/Thermo General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you? What can PIMS do for you? Not a lot right now Whatever you want, eventually ... ... as long as it's data management for protein production Version 0.2 ■ October 2005 ■ Then incremental delivery ■ … for one customer at a time and integrate with trunk ■ … and repeat until project complete Protocol Editor Applet Protocol Editor ■ Choose a step from a list ■ Draw Temperature step ■ List of the protocol's steps already done and reload them from the bottom of the screen ■ Record the protocol in DB ■ Display the protocol's list from DB in the explorer and reload anyone of them Applet Workflow ■ ■ ■ ■ Select in tabulation the experiment categories Drag and drop the selected experiments Build a workflow or load an existing one Associate a protocol to an experiment A collaborative framework ■ … to develop a family of LIMSes ■ Developers have difficulty in justifying the time required to create the software needed ■ The biologist doesn't want to wait ■ The result is a rapidly written LIMS that is fragile and cannot scale if the project grows up ■ Need a generic LIMS ■ helps to solve these problems by giving developers a tool that can scale to meet the needs of a large project ■ And which welcome plugins for novel methods Conclusion ■ Each “Click” could be a lot of coding ... ■ What do molecular biologists really want? ■ Expectations are High! ■ Users make an indispensable contribution ■ Tell us when it's not good enough ... ■ ... we will respond Acknowledgements ■ PIMS developer group ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Chris Morris (CCP4) Anne Pajon (EBI) Ed Daniel (Daresbury) Peter Troshin (MPSI) Jo van Niekerk (SSPF) Susy Griffiths (YSBL) Jon Diprose (OPPF) Katherine Pilicheva (OPPF) Anne Poupon (IBBMC) Eric Oeuillet (IBBMC) Sabrina Haquin (IBBMC) Alun Ashton (Diamond) ■ EBI-MSD ■ Kim Henrick ■ Wim Vranken ■ John Ionides ■ CCPN ■ ■ ■ ■ Wayne Boucher Rasmus Fogh Tim Stevens Dan