Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NMRbox Data-as-a-Service Overview data archival and retrieval Projects CANMRDG 2016 software integration data interchange Data-as-a-Service [1] Analysis-as-a-service Objectives 1 2 3 1. CONNJUR: capture metadata to save the state of NMR study. 2. CONNJUR as a deposition engine to BMRB. 3. M2M communication services between NMRbox and BMRB. CANMRDG 2016 Data-as-a-Service [2] Approach: CONNJUR Workflow Builder Graphical software integration platform for spectral reconstruction Spectrum Translator Command-line tool for translating time and frequency domain data. Integral component of Workflow Builder. Sparky “R” Extension Annotation for reproducibility NMR-STAR Parser Translation tool CONNJUR Database MySQL database managing datasets used by Workflow Builder CANMRDG 2016 Data-as-a-Service [3] Approach: BMRB Application Program Interface (API) Allows for software access to the BMRB database, both for data retrieval and deposition Data Format Translators CONNJUR, NMR-STAR, XML, JSON, NEX Data Analysis & Visualization DEVise visualization tool, Libraries in R language, Validation tools Deposition Engine CONNJUR integration, automatic gathering and deposition of data and important meta-data, including workflow specs CANMRDG 2016 Data-as-a-Service [4] Workflow Builder CANMRDG 2016 Data-as-a-Service [5] Approach: NMRbox M2M data exchange API Query response BMRB servers Auto-query generator NMRbox user CONNJUR data harvester Time-domain and other files NMR spectrometer Spectral processing Peak lists NMRPipe Sparky CANMRDG 2016 CONNJUR database Auto assignments Restraints ABACUS TALOS+ Data-as-a-Service [6] Structure models CNS Content Harvesting for Deposition Deposition constructor API NMRbox user wwPDB CONNJUR data harvester Time-domain and other files NMR spectrometer BMRB Spectral processing Peak lists NMRPipe Sparky Auto assignments Restraints ABACUS TALOS+ CONNJUR workflow manager CANMRDG 2016 DRCC Data-as-a-Service [7] Structure models CNS NMRbox/CONNJUR Deposition Service NMR-STAR CONNJUR Raw data Spectral data Derived data Data annotation Metabolomics results CANMRDG 2016 Dynamics Chemistry Interactions Structure & related data Data-as-a-Service [8] Approach: NMRbox Data Mining – BMRB Archive Content Metadata chemical structure, natural source, sample, experimental detail Validation results LACS, AVS, PANAV, SPARTA+, CING, MolProbity Imported data coordinates, restraints, phi-psi angles Biological NMR & supplemental data Derived data back calculated chemical shifts, BLAST alignments Data interpretation citations External data links PDB, UniProt, KEGG, PubChem CANMRDG 2016 Data-as-a-Service [9] Approach: NMRbox BMRB Data Mining Exploring the BMRB archive for new knowledge • Expose the BMRB relational database and additional value added data for query and analysis from within the NMRbox platform • Develop information search and analysis tools that encompass the breadth of the BMRB archive Brief general examples • Prediction and analysis of intrinsically disordered protein conformational space from NMR spectral parameters and derived data • Search for links between NMR parameters, low population biopolymer conformers, and biopolymer interactions with other biopolymers and ligands • Extract RNA chemical shifts and statistics for improving automated chemical shift assignment methods and structure analysis • Integration of molecular dynamics simulations with NMR experimental results to understand biopolymer conformational sampling CANMRDG 2016 Data-as-a-Service [10] Data mining and visualization on BMRB – R libraries CA-CB Chemical shift Distibution in BMRB per residue CANMRDG 2016 Data-as-a-Service [11] Data mining and visualization on BMRB – R libraries Comparing HSQC spectra for homologous entries CANMRDG 2016 Data-as-a-Service [12] Data mining and visualization on BMRB – DEVise Comparing HSQC spectra for homologous entries CANMRDG 2016 Data-as-a-Service [13] Impacts (CONNJUR) 1- Additional metadata is critical to foster reproducibility. It serves dual purpose of allowing us to populate new instances of NMRbox. 2- Eases the burden on the NMR community for submitting data to the BMRB. As CONNJUR is capable of tracking larger amounts of intricate data than the spectroscopist is likely to be willing to provide – the BMRB depositions will be fuller. CANMRDG 2016 Data-as-a-Service [14] Impacts (BMRB) 1 - BMRB content relevant to the NMRbox users, and possibly unknown to them, will be exposed and presented without the need for user knowledge of the BMRB archive architecture or content or user training. 2 – New possibly unexpected correlations between NMRbox user data and the full BMRB archive (experimental, derived and/or predicted, validation, and other kinds of data) will be advanced. 3 – Workflow and preservation meta-data archived for reproducibility. CANMRDG 2016 Data-as-a-Service [15] Thank you! Any questions? CANMRDG 2016 Data-as-a-Service [16] Data mining and visualization on BMRB – R libraries TOCSY EXAMPLE CANMRDG 2016 Data-as-a-Service [17] Personnel Admin Hoch Infra Train Dissem CS DBPs TRD1 TRD2 TRD3 Maciejewski Schuyler Gryk Ulrich Eghbalnia Gilman Gorbatyuk Moraru Livny Maziuk TBN TBN1 TBN2 UConn Health CANMRDG 2016 Data-as-a-Service [18] TBN3 TBN4 Wisconsin TBN5 Metadata Examples for M2M and Data Mining Metadata examples Applications Biopolymer sequence, natural source including location Mining Intermediate data (restraints, chemical shifts, peak lists) Value added data (secondary structure elements, physical properties, etc.) Mining Mining Sample conditions (pH, temperature, pressure, ionic strength) Selection Validation report content User process annotations Selection Best practices Software application parameter files Best practices Pulse programs Best practices Spectrometer field strength Best practices Sample contents (buffers, salts, stabilizing agents, others) Best practices Author names Best practices Keywords Descriptive User text annotations Descriptive CANMRDG 2016 Data-as-a-Service [19] Personnel Personnel Effort Role Gryk 2.4 Co-leader of TRD2 Extend CONNJUR data model Ulrich 0.84 Co-leader of TRD2 Livny 0.24 Collaborator – systems design TBN1 9.6 Application architect CONNJUR software components Query Engine design Maziuk 1.2 Systems administration TBN3 8.4 Researcher/programmer BMRB software components TBN5 6 Programmer BMRB software components CANMRDG 2016 Data-as-a-Service [20] CONNJUR Schema Expansion (Aim 2.1) Current CONNJUR strengths • Spectrometers • Pulse programs Parameters Output data • Processing software Parameters Output data • • • • • • Current NMR-STAR strengths Citation Molecular system Sample Conditions Spectral data Derived data CANMRDG 2016 Fully extended CONNJUR schema Current NEF strengths • Structure software Input restraints data parameters Data-as-a-Service [21] NMR Computational Pipeline 1 2 3 4 L10 + A5 + Spectrometer Acquisition CANMRDG 2016 Spectral Reconstruction Spectral Analysis Data-as-a-Service [22] < 5Ǻ Biophysical Characterization