Download Survey Topics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expense and cost recovery system (ECRS) wikipedia , lookup

Information privacy law wikipedia , lookup

Data vault modeling wikipedia , lookup

Operational transformation wikipedia , lookup

Versant Object Database wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Asynchronous I/O wikipedia , lookup

Business intelligence wikipedia , lookup

Object storage wikipedia , lookup

Transcript
WP19
DESY Development Plan
Frank Schlünzen
Jürgen Starek
Object stores - Motivation
• Data catalogues (DC) and digital objects (DO) are loosely coupled
• ACL changes in DC are not propagated to DO (and vice versa)
• No messaging between DC and DO
• Removal, corruption, update of a DO doesn‘t result in DC notification
• No rights-management on filesystem layer
• All files owned by super-users
• Needs to trust applications for rights management
• Synch
• Status of DC and DO can easily get out of sync
• Metadata in DC and DO can easily get out of sync
• ACLs in DC and DO are probably never in synch
• UIDs inside and outside of DC need synch
Object stores - Motivation
• Object stores have the potential to solve a number of that issues
• Particularly interesting systems (for us)
– dCache
• is close to being a full object store implementation
• Extendable for indexed, efficiently searchable user-defined meta-data for any object
• Full, scalable tape backend
– CEPH
• very interesting system with potentially high efficiency for small files
• might offer good performance in combination with enhanced meta-data and acl handling
• naturally choice for clouds (open nebula)
• Mature: in linux kernel
Object stores - Proposal
In order to supplement the existing in-house knowledge about the properties of
different file systems and storage hardware solutions, we propose to survey existing
and planned object store solutions, evaluating possible performance or
maintenance advantages over classical POSIX file systems.
• Survey Topics
– Which Object Store systems are available as commercial products or free software,
and which similar systems exist or are planned?
– What methods to access data in these object stores exist?
– Are there systems that go beyond POSIX I/O and POSIX rights management? How do
they present these improved semantics to the user?
– What are common designs and best practices in designing and operating object
stores?
– What is the I/O performance of these systems when storing differently-sized
objects?
– What is the best performing hardware setup for these systems?
– Is there a quality-of-service-mechanism for data streams? Is there an I/O scheduler?
– What metadata are supported by the systems?
Virtual Analysis Plattform
Remote access to compute resources for remote data analysis in a fully interactive
and graphical environment is a common requirement at EuroFEL RIs.
• Currently, this is realized through rather inconvenient and poorly scalable
mechanisms.
• Data protection and encapsulation of the environment is an issue.
• We aim to evaluate some in-house available virtual/cloud resources and to setup a prototype system for photon science applications:
– provide a virtual host/environment tailored to a specific experiment and application, with a predefined software stack and data.
– The user should have seamless access to his and only his data.
• We will investigate possibilities to provide data access through different pathways (e.g. ICAT, dCache, object store).
• Possibilities to connect IdPs will be investigated as well.
(p)HDF5 and FhGFS
Visualization of individual slices of images within a HDF-container archived in
ICAT is a common user requirement. Virtual appliances are one way to visualize
data remotely, alternatively could be realized as web services:
•
•
•
Investigate h5ws, a globus based web service
Investigate paraview which offers webGL, vrml, js representations
if feasible – implement a prototype solution.
Many photon-science applications are i/o rather than cpu-limited. pHDF5 is
one way to accelerate parallel read/write access to HDF5-container. Currently a
number of photon-science labs are supporting development of HDF5accellerators by HDF.org.
•
•
•
test upcoming (p)HDF5 implementations on available parallel filesystems, in particular FhGFS.
realize a non-persistent FhGFS in distributed memory and test performance and stability of
such a system.
Test FhGFS capabilities to create filesystems on demand and entirely in user space