Download GEMSdb - University of Notre Dame

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Database wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
BioCoRE and GEMS: Cyber
Infrastructure for Cyber Chemistry
Jesús A. Izaguirre
Computer Science & Engineering
University of Notre Dame
with Kirby Vandivort
NIH Resource for Macromolecular Modeling and
Bioinformatics
University of Illinois
Overview I
• Chemical applications such as virtual
screening, protein kinetics and structure,
and analysis and validation of molecular
simulations require enormous resources that
can be provided by CyberInfrastructure
• Successful solution of these problems
require collaborative approaches, also
facilitated by CyberInfrastructure
BioCoRE and GEMS
3 October 2004
Overview II
To make CyberInfrastucture effective, the
following issues must be addressed:
• Users of CyberInfrastructure need a datacentric way of managing their computations
and data
• Distributed databases on the grid need to
address the problem of reliability and faulttolerance of data
BioCoRE and GEMS
3 October 2004
Overview III
• We will study examples of collaborative
software that address these issues,
primarily:
– BioCoRE: A Collaboratory for Structural
Biology
– GEMS: Grid Enabled Molecular Simulations
Toolset and Database
BioCoRE and GEMS
3 October 2004
Sample CyberScience Projects
Collaborative Biophysics BioCoRE
K. Schulten, Illinois
Virtual Screening
The Screensaver Project
W.G. Richards, Oxford
Protein Kinetics
Folding@Home
V. Pande, Stanford
Distributed Database of BioSimGrid
Molecular Simulations
M. Sansom, Oxford
BioCoRE and GEMS
3 October 2004
What is BioCoRE?
BioCoRE: a collaborative
work environment for
biomedical research,
research management and
training.
BioCoRE assists the entire
research process, from
talking with collaborators to
performing simulations and
collecting data, to preparing
papers and reports.
BioCoRE and GEMS
3 October 2004
Sharing Documents
With the BioFS and
WebDAV, scientists can
exchange and edit files
from anywhere with a
web connection.
BioCoRE and GEMS
3 October 2004
Setting Up and Running Simulations
• NAMDCFG: A
“Simulation Setup
Wizard”
• Online help and
error checking for
NAMD input files
• Job submission to
supercomputers
simplified
• Job status monitored for easy
retrieval
• Job data archived for future
reference
BioCoRE and GEMS
3 October 2004
Sharing Molecular Views
Using VMD
and BioCoRE,
collaborators
may exchange
and manipulate
3-D models of
molecules
Emphasis on
collaborative
sessions.
Streamlined
process of
sharing views.
BioCoRE and GEMS
3 October 2004
Communicating
• Control Panel
provides instant
messaging and
notifications
• BioCoRE also
provides
message boards,
Web site library,
lab book
BioCoRE and GEMS
3 October 2004
Programming Interface
• Provide way for
users to
programmatically
interact with
BioCoRE.
• Communication
(Control Panel),
shared states
(VMD)
• WebDAV
BioCoRE and GEMS
3 October 2004
Availability
• Free
• Can be accessed from Illinois site, or server
software can be installed locally
• Server software can be modified if
necessary
• http://www.ks.uiuc.edu/Research/biocore/
BioCoRE and GEMS
3 October 2004
Virtual Screening
• Combinatorial Complexity
Lead Exploration
• Screen docking affinities
based on a scoring
function (interaction
energies, RMSD, etc…)
• Modeled as an all pairs
problem
• Logically independent
computational
requirements are well
suited for wide area grid
distribution
BioCoRE and GEMS
Leads (ligands)
L0001
L0002
L0003
L0004
L0005
3 October 2004
CyberInfrastructure Needs for Virtual
Screening I
• Incorporate protein (receptor) flexibility
– Use multiple protein structures (hierarchical
representations and algorithms)
• Iterative refinement of results
– Add new protein conformations to improve
docking
– Use higher resolution models for promising hits
(integration of data and work flow)
– Monitor status of results (not just jobs running)
BioCoRE and GEMS
3 October 2004
CyberInfrastructure Needs for Virtual
Screening II
• Manage computation and storage in the grid
– Declarative rather than imperative specification
• Automate usage of algorithms / tools
– Select software and optimal parameters for
algorithms (recommender system)
– Example: MDSimAid
(http://mdsimaid.cse.nd.edu) selects optimal
MD simulation protocol (limited options)
BioCoRE and GEMS
3 October 2004
BioSimGrid
Mark S. P. Sansom, Oxford
• Database for biomolecular simulations
• Specifically: molecular dynamics trajectories
• Facilitate validation and analysis of simulations
• Provides “independence” from the specific simulation semantics
(configuration parameters, architecture, simulation tools, etc…)
• Trajectory data stored in
relational database tables per
Data Schema
• Semi-Automated Deposition
of trajectory files for certain
formats (CHARMM, NAMD,
etc…)
• Trajectory analysis modules
• Future goal to distribute
database
BioCoRE and GEMS
3 October 2004
CyberInfrastructure Needs for
Distributed Databases I
• Metadata for trajectories
– Simulation protocol, software, etc.
• Distribution on the grid
– Storage fault tolerance / reliability
– Scalable solution: reduce storage requirements
and centralization
BioCoRE and GEMS
3 October 2004
CyberInfrastructure Needs for
Distributed Databases II
• Data-driven model for the user
– Data organized around key themes (trajectories,
molecules)
• Generic tools for developers
– Applicable to different applications
BioCoRE and GEMS
3 October 2004
Solving Integration Problem
•
We need to capture the data flow and the
work flow
–
–
–
Ecce project
XML metadata
Component architectures (e.g., JavaBeans,
Common Component Architecture)
BioCoRE and GEMS
3 October 2004
Solving Integration Problem
• BioCoRE (K. Schulten, Illinois)
– Use of programming interface
– Provides multiple services to applications (web
file system, job management, shared
visualization)
BioCoRE and GEMS
3 October 2004
Solving Grid Management
• Current grid tools are task oriented: run
this particular simulation code with these
input files, etc.
– Web portals are an incremental improvement
over command line or stand alone applications
• Problem: Controlling multiple resources
– For example, create 10,000 tasks & keep track
of the data, as might be needed for virtual
screening or @home applications
BioCoRE and GEMS
3 October 2004
Solving Grid Management with GIPSE
• GIPSE: Grid Interface for Parameter-driven
Simulation Environments
– Shift focus from management to research
– Result-driven interface
– Scripting capabilities
BioCoRE and GEMS
3 October 2004
Solving grid management with GIPSE
• Simplify process
– XML Data format
– Missing “glue”
• Powerful searches
– Optimizations
– Control loops
GEMS Toolset
BioCoRE and GEMS
HIV-1 Protease
3 October 2004
Solving grid management with GIPSE
• Manage data
– Storage
– Database retrieval
• Monitor progress
– Status
– Application
– specific
GEMS Toolset
BioCoRE and GEMS
HIV-1 Protease
3 October 2004
GEMS Database Toolset
• Grid Enabled Molecular
Simulation
– Data Centric
– Wide area distributed storage
– Researchers have data and
resource autonomy
– Simulation configuration, input
data files, and output data files
identified via XML
– Centralized SQL locator
– Availability via replication
BioCoRE and GEMS
3 October 2004
Reliability and Leveraged
Availability via Runtime Imaging
• Reliability of data storage is increased
• User can tradeoff availability versus storage volume
• Workspace data has 2-way redundancy by default
• Archival data has a 2-way redundancy of fewer
snapshots, but saves the computational images
• For each computational run through the GEMS portal a
comprehensive runtime image is created from which the
simulation can automatically be regenerated.
• Runtime images include executable version and location,
library requirements, hardware requirements, input files, and
configuration parameters
BioCoRE and GEMS
3 October 2004
Integration of Distributed Data
Into New Simulations
• A grid distributed “make” based on a
computational requirement over a set parameter
sweep
– Example: optimize MD simulation protocol
• Before starting the sweep a query determines
data points that are up to date and those that
require computation (including regeneration)
– Example: keep current list of results of virtual
screening as more computations are performed or
targets and ligands added
BioCoRE and GEMS
3 October 2004
Example: Validating Simulations
• Locate specific published simulation configurations for
benchmarking
• Select pertinent input data files (pdb, psf, force fields,
etc…) for direct utilization in a new simulation for
purpose of comparison/contrast.
• Researcher B wants to vary certain
parameters of Researcher A’s published
simulation to test her new MD integrator
BioCoRE and GEMS
3 October 2004
Acknowledgments
• Collaborators in
GIPSE and GEMS:
– Aaron Striegel
– Doug Thain
– Jeff Peng
• Students
– Paul Brenner
– Santanu Chatterjee
• Klaus Schulten
• BioCoRE Team:
– Robert Brunner
– Michael Bach
– David Brandon
• BioCoRE funding
from NIH
• Funding from NSF
Career and
Biocomplexity
BioCoRE and GEMS
3 October 2004