Download iRODS @ CC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Generic policy rules and
principles
Jean-Yves Nief
Talk overview

An introduction to CC-IN2P3 activity.
 iRODS in production:
– Why are we using it ?
– Who is using it ?
– Prospects.

iRODS rules policies through examples:
– Resource Monitoring System.
– Biomedical applications:
• Human data.
• Animal data.
–
–
–
–
Arts and Humanities.
Other rules: Mass storage system interface, access rights.
Pitfalls.
Future usages.
Repository workshop - Garching
20/09/10
2
CC-IN2P3 activities

Federate computing needs of the french scientific community in:
– Nuclear and particle physics.
– Astrophysics and astroparticles.
 Computing services to international collaborations:
- CERN (LHC), Fermilab, SLAC, ….
 Opened now to biology, Arts & Humanities.
dapnia
Repository workshop - Garching
20/09/10
3
iRODS @ CC-IN2P3: why using it ?

National and international collaborations.
 Users spread geographically (Europe, America, Australia…).
 Need for storage virtualization:
- federation of heterogeneous storage (disks, tapes) and data
access system (MSS, databases…).
- transparent data access for end users.
- middleware working on heterogeneous OS.
- common logical name space.
- virtual organization (access rights, groups etc…).
- metadata search.
- Easy interface with any kind of clients applications (APIs,
drivers).
Repository workshop - Garching
20/09/10
4
iRODS @ CC-IN2P3: why using it ?



SRB being used since 2003:
– 3 PBs handled for 10 different experiments (HEP, astro,
biology).
– Decomissionning: end of 2012 ?
Limitation:
– no centralized data management (DM).
 no enforcement of DM policy.
iRODS rules based policy:
– adequate solution.
– from the user point of view: virtualization of data
management policy.
Repository workshop - Garching
20/09/10
5
iRODS @ CC-IN2P3: who is using it ?





Arts and Humanities (Adonis):
– Long term data preservation.
– Web and batch jobs access.
Biology (phylogenetic), fluid mechanics:
– grid jobs.
Biomedical applications:
– Human and animal imagery.
Biology (phylogenetic), fluid mechanics:
– grid jobs.
High Energy physics:
– Neutrino experiment.
Repository workshop - Garching
20/09/10
6
iRODS @ CC-IN2P3: who is going to use it ?

Astrophysics experiments:
– LSST …
Other biomedical, physics projects.
 iRODS will be part of French NGI.
 All the SRB instances to be moved to
iRODS.
 1 PB should be reached soon.

Repository workshop - Garching
20/09/10
7
Rules examples: Arts and Humanities

Ex: archival and data publication of audio files
(CRDO).
CRDO
CC-IN2P3
CINES
Archive
Repository workshop - Garching
Fedora
1. Data transfer: CRDO 
CINES (Montpellier).
2. Archived at CINES.
3. iRODS transfer to CCIN2P3: iput file.tar
4. Automatic untar at Lyon
+ checksum.
5. Automatic registration in
Fedora-commons
(delayed rule).
20/09/10
8
Rules examples: biomedical data

Human and animal data (fMRI, PET, MEG etc…).
 Usually in DICOM format.
 Main issue for human data:
– Need to be anonymized !
 Need to do metadata search on DICOM files.
 Rule:
1. Check for anonymization of the file: send a warning if not true.
2. Extract a subset of metadata (based on a list stored in iRODS)
from DICOM files.
3. Add these metadata as user defined metadata in iRODS.
Repository workshop - Garching
20/09/10
9
Rules examples: resource monitoring
system
Perf script
iRODS data server
1.
Ask each server for its
metrics: rule engine cron
task (msi).
2.
Performance script
launched on each server.
3.
Results sent back to the
iCAT.
4.
Store metrics into iCAT.
5.
Compute a «quality factor»
for each server stored in an
other table: r.e. cron task
(msi).
Perf script
DB
iRODS data server
iRODS iCAT server
Perf script
iRODS data server
Perf script
iRODS data server
Repository workshop - Garching
20/09/10
10
Other rules


Mass Storage System integration:
– Using compound resources: iRODS disk cache + tapes.
– Data on disk cache replication into MSS asynchronously (1h later)
using a delayExec rule.
– Recovery mechanism: retries until success, delay between each
retries is doubled at each round.
ACL management:
– Rules needed for fine granularity access rights management.
– Eg:
• 3 groups of users (admins, experts, users).
• ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r
• ACLs on all others subcollections => admins + experts : r/w, users : r
Repository workshop - Garching
20/09/10
11
Developpements needed

Scripts/binaries:
– Metadata extraction from DICOM files.
– Registration of files into Fedora-Commons.
– …
 Needed whatever storage system being used underneath.
 Micro-services:
– ACLs, tar/untar of archives file,…
 APIs already available, did not require a large amount of work (parts of
iRODS distro).
– Resource Monitoring System: bigger developpement, includes
modification of the iCAT schema.
 Rules:
– Most of them are simple.
– Somes requires more work (Adonis project), workflow more complex.
Repository workshop - Garching
20/09/10
12
Pitfalls and bugs




Writing complex rules:
– Avoid writing them directly using the .irb syntax.
– Becomes difficult to debug especially with nested actions.
solution: need to use ruleGen to generate rules in a more user
friendly manner.
Some memory leaks found with irodsReServer with Oracle as a
backend:
 Fixed in 2.4.
delayExec syntax bugs:
Fixed in 2.4 and 2.4.1.
Rules in configuration file at the moment:
– Must be consistent on all the iRODS servers.
 Will be in the iCAT database in the future.
Repository workshop - Garching
20/09/10
13
Prospects


Rules for database interaction (in progress):
– Will be used by DTM (developped at CC-IN2P3):
• DTM managed list of tasks to be processed by a batch cluster.
• DTM requires a database to manage the tasks.
– Rule launched by the client will interact with the DTM database through
iRODS:
• More security: iRODS used as a proxy server (database behind a
firewall, use iRODS authentication.
• Database schema upgrade transparent for the client (no SQL code
launched on the client side).
Xmessaging system (part of iRODS):
– Allow to exchange messages between different iRODS process or clients.
– e.g.: Could be used to monitor job status in a distributed computing
environnement.
Repository workshop - Garching
20/09/10
14
Acknowledgement

Thanks to:
– Pascal Calvat.
– Yonny Cardenas.
– Thomas Kachelhoffer.
– Pierre-Yves Jallud.
iRODS at CC-IN2P3
03/25/10
15