Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Generic policy rules and principles Jean-Yves Nief Talk overview  An introduction to CC-IN2P3 activity.  iRODS in production: – Why are we using it ? – Who is using it ? – Prospects.  iRODS rules policies through examples: – Resource Monitoring System. – Biomedical applications: • Human data. • Animal data. – – – – Arts and Humanities. Other rules: Mass storage system interface, access rights. Pitfalls. Future usages. Repository workshop - Garching 20/09/10 2 CC-IN2P3 activities  Federate computing needs of the french scientific community in: – Nuclear and particle physics. – Astrophysics and astroparticles.  Computing services to international collaborations: - CERN (LHC), Fermilab, SLAC, ….  Opened now to biology, Arts & Humanities. dapnia Repository workshop - Garching 20/09/10 3 iRODS @ CC-IN2P3: why using it ?  National and international collaborations.  Users spread geographically (Europe, America, Australia…).  Need for storage virtualization: - federation of heterogeneous storage (disks, tapes) and data access system (MSS, databases…). - transparent data access for end users. - middleware working on heterogeneous OS. - common logical name space. - virtual organization (access rights, groups etc…). - metadata search. - Easy interface with any kind of clients applications (APIs, drivers). Repository workshop - Garching 20/09/10 4 iRODS @ CC-IN2P3: why using it ?    SRB being used since 2003: – 3 PBs handled for 10 different experiments (HEP, astro, biology). – Decomissionning: end of 2012 ? Limitation: – no centralized data management (DM).  no enforcement of DM policy. iRODS rules based policy: – adequate solution. – from the user point of view: virtualization of data management policy. Repository workshop - Garching 20/09/10 5 iRODS @ CC-IN2P3: who is using it ?      Arts and Humanities (Adonis): – Long term data preservation. – Web and batch jobs access. Biology (phylogenetic), fluid mechanics: – grid jobs. Biomedical applications: – Human and animal imagery. Biology (phylogenetic), fluid mechanics: – grid jobs. High Energy physics: – Neutrino experiment. Repository workshop - Garching 20/09/10 6 iRODS @ CC-IN2P3: who is going to use it ?  Astrophysics experiments: – LSST … Other biomedical, physics projects.  iRODS will be part of French NGI.  All the SRB instances to be moved to iRODS.  1 PB should be reached soon.  Repository workshop - Garching 20/09/10 7 Rules examples: Arts and Humanities  Ex: archival and data publication of audio files (CRDO). CRDO CC-IN2P3 CINES Archive Repository workshop - Garching Fedora 1. Data transfer: CRDO  CINES (Montpellier). 2. Archived at CINES. 3. iRODS transfer to CCIN2P3: iput file.tar 4. Automatic untar at Lyon + checksum. 5. Automatic registration in Fedora-commons (delayed rule). 20/09/10 8 Rules examples: biomedical data  Human and animal data (fMRI, PET, MEG etc…).  Usually in DICOM format.  Main issue for human data: – Need to be anonymized !  Need to do metadata search on DICOM files.  Rule: 1. Check for anonymization of the file: send a warning if not true. 2. Extract a subset of metadata (based on a list stored in iRODS) from DICOM files. 3. Add these metadata as user defined metadata in iRODS. Repository workshop - Garching 20/09/10 9 Rules examples: resource monitoring system Perf script iRODS data server 1. Ask each server for its metrics: rule engine cron task (msi). 2. Performance script launched on each server. 3. Results sent back to the iCAT. 4. Store metrics into iCAT. 5. Compute a «quality factor» for each server stored in an other table: r.e. cron task (msi). Perf script DB iRODS data server iRODS iCAT server Perf script iRODS data server Perf script iRODS data server Repository workshop - Garching 20/09/10 10 Other rules   Mass Storage System integration: – Using compound resources: iRODS disk cache + tapes. – Data on disk cache replication into MSS asynchronously (1h later) using a delayExec rule. – Recovery mechanism: retries until success, delay between each retries is doubled at each round. ACL management: – Rules needed for fine granularity access rights management. – Eg: • 3 groups of users (admins, experts, users). • ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r • ACLs on all others subcollections => admins + experts : r/w, users : r Repository workshop - Garching 20/09/10 11 Developpements needed  Scripts/binaries: – Metadata extraction from DICOM files. – Registration of files into Fedora-Commons. – …  Needed whatever storage system being used underneath.  Micro-services: – ACLs, tar/untar of archives file,…  APIs already available, did not require a large amount of work (parts of iRODS distro). – Resource Monitoring System: bigger developpement, includes modification of the iCAT schema.  Rules: – Most of them are simple. – Somes requires more work (Adonis project), workflow more complex. Repository workshop - Garching 20/09/10 12 Pitfalls and bugs     Writing complex rules: – Avoid writing them directly using the .irb syntax. – Becomes difficult to debug especially with nested actions. solution: need to use ruleGen to generate rules in a more user friendly manner. Some memory leaks found with irodsReServer with Oracle as a backend:  Fixed in 2.4. delayExec syntax bugs: Fixed in 2.4 and 2.4.1. Rules in configuration file at the moment: – Must be consistent on all the iRODS servers.  Will be in the iCAT database in the future. Repository workshop - Garching 20/09/10 13 Prospects   Rules for database interaction (in progress): – Will be used by DTM (developped at CC-IN2P3): • DTM managed list of tasks to be processed by a batch cluster. • DTM requires a database to manage the tasks. – Rule launched by the client will interact with the DTM database through iRODS: • More security: iRODS used as a proxy server (database behind a firewall, use iRODS authentication. • Database schema upgrade transparent for the client (no SQL code launched on the client side). Xmessaging system (part of iRODS): – Allow to exchange messages between different iRODS process or clients. – e.g.: Could be used to monitor job status in a distributed computing environnement. Repository workshop - Garching 20/09/10 14 Acknowledgement  Thanks to: – Pascal Calvat. – Yonny Cardenas. – Thomas Kachelhoffer. – Pierre-Yves Jallud. iRODS at CC-IN2P3 03/25/10 15