Download Slide - NMRbox

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
NMRbox Data-as-a-Service Overview
data archival and retrieval
Projects
CANMRDG 2016
software integration
data interchange
Data-as-a-Service [1]
Analysis-as-a-service
Objectives
1
2
3
1. CONNJUR: capture metadata to save the state of NMR study.
2. CONNJUR as a deposition engine to BMRB.
3. M2M communication services between NMRbox and BMRB.
CANMRDG 2016
Data-as-a-Service [2]
Approach: CONNJUR
Workflow Builder
Graphical software integration platform for
spectral reconstruction
Spectrum Translator
Command-line tool for translating time and
frequency domain data.
Integral component of Workflow Builder.
Sparky “R” Extension
Annotation for reproducibility
NMR-STAR Parser
Translation tool
CONNJUR Database
MySQL database managing datasets used
by Workflow Builder
CANMRDG 2016
Data-as-a-Service [3]
Approach: BMRB
Application Program
Interface (API)
Allows for software access to the BMRB
database, both for data retrieval and
deposition
Data Format Translators
CONNJUR, NMR-STAR, XML, JSON, NEX
Data Analysis &
Visualization
DEVise visualization tool, Libraries in
R language, Validation tools
Deposition Engine
CONNJUR integration, automatic gathering
and deposition of data and important meta-data,
including workflow specs
CANMRDG 2016
Data-as-a-Service [4]
Workflow Builder
CANMRDG 2016
Data-as-a-Service [5]
Approach: NMRbox M2M data exchange
API
Query
response
BMRB
servers
Auto-query
generator
NMRbox
user
CONNJUR
data harvester
Time-domain
and other
files
NMR
spectrometer
Spectral
processing
Peak lists
NMRPipe
Sparky
CANMRDG 2016
CONNJUR
database
Auto
assignments
Restraints
ABACUS
TALOS+
Data-as-a-Service [6]
Structure
models
CNS
Content Harvesting for Deposition
Deposition
constructor
API
NMRbox
user
wwPDB
CONNJUR
data harvester
Time-domain
and other
files
NMR
spectrometer
BMRB
Spectral
processing
Peak lists
NMRPipe
Sparky
Auto
assignments
Restraints
ABACUS
TALOS+
CONNJUR workflow manager
CANMRDG 2016
DRCC
Data-as-a-Service [7]
Structure
models
CNS
NMRbox/CONNJUR Deposition Service
NMR-STAR
CONNJUR
Raw data
Spectral data
Derived data
Data annotation
Metabolomics
results
CANMRDG 2016
Dynamics
Chemistry
Interactions
Structure
&
related data
Data-as-a-Service [8]
Approach: NMRbox Data Mining – BMRB Archive Content
Metadata
chemical structure, natural source,
sample, experimental detail
Validation results
LACS, AVS, PANAV,
SPARTA+, CING,
MolProbity
Imported data
coordinates, restraints,
phi-psi angles
Biological
NMR & supplemental
data
Derived data
back calculated chemical shifts,
BLAST alignments
Data interpretation
citations
External data links
PDB, UniProt, KEGG,
PubChem
CANMRDG 2016
Data-as-a-Service [9]
Approach: NMRbox BMRB Data Mining
Exploring the BMRB archive for new knowledge
• Expose the BMRB relational database and additional value added data
for query and analysis from within the NMRbox platform
• Develop information search and analysis tools that encompass the
breadth of the BMRB archive
Brief general examples
• Prediction and analysis of intrinsically disordered protein conformational
space from NMR spectral parameters and derived data
• Search for links between NMR parameters, low population biopolymer
conformers, and biopolymer interactions with other biopolymers and
ligands
• Extract RNA chemical shifts and statistics for improving automated
chemical shift assignment methods and structure analysis
• Integration of molecular dynamics simulations with NMR experimental
results to understand biopolymer conformational sampling
CANMRDG 2016
Data-as-a-Service [10]
Data mining and visualization on BMRB – R libraries
CA-CB
Chemical shift
Distibution in
BMRB per
residue
CANMRDG 2016
Data-as-a-Service [11]
Data mining and visualization on BMRB – R libraries
Comparing
HSQC spectra
for homologous
entries
CANMRDG 2016
Data-as-a-Service [12]
Data mining and visualization on BMRB – DEVise
Comparing
HSQC spectra
for homologous
entries
CANMRDG 2016
Data-as-a-Service [13]
Impacts (CONNJUR)
1- Additional metadata is critical to foster reproducibility.
It serves dual purpose of allowing us to populate new
instances of NMRbox.
2- Eases the burden on the NMR community for submitting
data to the BMRB. As CONNJUR is capable of tracking
larger amounts of intricate data than the spectroscopist is
likely to be willing to provide – the BMRB depositions will
be fuller.
CANMRDG 2016
Data-as-a-Service [14]
Impacts (BMRB)
1 - BMRB content relevant to the NMRbox users, and
possibly unknown to them, will be exposed and presented
without the need for user knowledge of the BMRB archive
architecture or content or user training.
2 – New possibly unexpected correlations between NMRbox
user data and the full BMRB archive (experimental, derived
and/or predicted, validation, and other kinds of data) will be
advanced.
3 – Workflow and preservation meta-data archived for
reproducibility.
CANMRDG 2016
Data-as-a-Service [15]
Thank you!
Any questions?
CANMRDG 2016
Data-as-a-Service [16]
Data mining and visualization on BMRB – R libraries
TOCSY
EXAMPLE
CANMRDG 2016
Data-as-a-Service [17]
Personnel
Admin
Hoch
Infra
Train
Dissem
CS
DBPs
TRD1
TRD2
TRD3
Maciejewski
Schuyler
Gryk
Ulrich
Eghbalnia
Gilman
Gorbatyuk
Moraru
Livny
Maziuk
TBN
TBN1
TBN2
UConn Health
CANMRDG 2016
Data-as-a-Service [18]
TBN3
TBN4
Wisconsin
TBN5
Metadata Examples for M2M and Data Mining
Metadata examples
Applications
Biopolymer sequence, natural source including location
Mining
Intermediate data (restraints, chemical shifts, peak lists)
Value added data (secondary structure elements, physical properties, etc.)
Mining
Mining
Sample conditions (pH, temperature, pressure, ionic strength)
Selection
Validation report content
User process annotations
Selection
Best practices
Software application parameter files
Best practices
Pulse programs
Best practices
Spectrometer field strength
Best practices
Sample contents (buffers, salts, stabilizing agents, others)
Best practices
Author names
Best practices
Keywords
Descriptive
User text annotations
Descriptive
CANMRDG 2016
Data-as-a-Service [19]
Personnel
Personnel
Effort
Role
Gryk
2.4
Co-leader of TRD2
Extend CONNJUR data model
Ulrich
0.84
Co-leader of TRD2
Livny
0.24
Collaborator – systems design
TBN1
9.6
Application architect
CONNJUR software components
Query Engine design
Maziuk
1.2
Systems administration
TBN3
8.4
Researcher/programmer
BMRB software components
TBN5
6
Programmer
BMRB software components
CANMRDG 2016
Data-as-a-Service [20]
CONNJUR Schema Expansion (Aim 2.1)
Current CONNJUR
strengths
• Spectrometers
• Pulse programs
Parameters
Output data
• Processing software
Parameters
Output data
•
•
•
•
•
•
Current NMR-STAR
strengths
Citation
Molecular system
Sample
Conditions
Spectral data
Derived data
CANMRDG 2016
Fully extended
CONNJUR
schema
Current NEF
strengths
• Structure software
Input restraints data
parameters
Data-as-a-Service [21]
NMR Computational Pipeline
1
2
3
4
L10
+
A5
+
Spectrometer
Acquisition
CANMRDG 2016
Spectral
Reconstruction
Spectral
Analysis
Data-as-a-Service [22]
< 5Ǻ
Biophysical
Characterization
Related documents