Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Grid services based architectures Some buzz words, sorry… • Growing consensus that Grid services is the right concept for building the computing grids; • Recent ARDA work has provoked quite a lot of interest: – In experiments; – SC2 GTA group; – EGEE • Personal opinion - this the right concept that arrived at the right moment: – Experiments need practical systems; – EDG is not capable to provide one; – Need for pragmatic, scalable solution without having to start from scratch. 1 Tentative ARDA architecture 1: Job Provenance Information Service Auditing 2: Authentication 3: API Authorisation User Interface 6: 4: Accounting Metadata Catalogue DB Proxy 14: 5: 13: File Catalogue 7: 10: Workload Management 9: Package Manager Data Management 11: Discuss more these parts in the following 15: Storage Element Grid Monitoring 12: 8: Computing Element Job Monitor 2 Metadata catalogue (Bookkeeping database) (1) LHCb Bookkeeping: Very flexible schema ; Storing objects (jobs, qualities, others ?) ; Available as a services (XML-RPC interface) ; Basic schema is not efficient for generic queries: need to build predefined views (nightly ?) ; views fit the query by the web form, what about generic queries ? data is not available immediately after production ; Needs further development, thinking, searching … 3 Metadata catalogue (Bookkeeping database) (2) Possible evolution: keep strong points, work on weaker ones • Introduce hierarchical structure – HEPCAL recommendation for eventual DMC ; – AliEn experience ; • Better study sharing parameters between the job and file objects ; • Other possible ideas. This is a critical area – worth investigation ! But… (see next slide) 4 Metadata catalogue (Bookkeeping database) (3) • Man power problems : – For development, but also for maintenance ; • We need the best possible solution : – Evaluate other solutions: • AliEn, DMC eventually ; • Contribute to the DMC development ; – LHCb Bookkeeping as a standard service : • Replaceable if necessary ; • Fair test of other solutions. 5 Metadata catalogue (Bookkeeping database) (4) • Some work has started in Marseille: – AliEn FileCatalogue installed and populated by the information from the LHCb Bookkeeping ; – Some query efficiencies measurements done ; – The results are not yet conclusive : • Clearly fast if the search in the hierarchy of directories; • Not so fast if more tags are included in the query; • Very poor machine used in CPPM – not fair to compare to the CERN Oracle server. – The work started to provide single interface to both AliEn FileCatalogue and LHCb Bookkeeping ; • How to continue: – CERN group – possibilities to contribute ? – CPPM group will continue to follow this line, but resources are limited ; – Collaboration with other projects is essential; 6 File Catalogue (Replica database) (1) • The LHCb Bookkeeping was not conceived with the replica management in mind – added later ; • File Catalog needed for many purposes: – – – – – Data; Software distribution; Temporary files (job logs, stdout, stderr, etc); Input/Output sandboxes ; Etc, etc • Absolutely necessary for DC2004; • File Catalog must provide controlled access to its data (private group, user directories) ; In fact we need a full analogue of a distributed file system 7 File Catalogue (Replica database) (2) • We should look around for possible solutions : – Existing ones (AliEn, RLS) : • Will have Grid services wrapping soon ; • Will comply with the ARDA architecture eventually; • Large development teams behind (RLS, EGEE ?) • This should be coupled with the whole range of the data management tools: – Browsers; – Data transfers, both scheduled and on demand; – I/O API (POOL, user interface). This is a huge enterprise, and we should rely on using one of the available systems 8 File Catalogue (Replica database) (3) • Suggestion is to start with the deployment of the AliEn FileCatalogue and data management tools: – Partly done; – Pythonify the AliEn API: • This will allow developing GANGA and other application plugins; • Should be easy as the C++ API (almost) exist. – Should be interfaced with the DIRAC workload management (see below); – Who ? CPPM group, others are very welcome; – Where ? Install the server at CERN. • Follow the evolution of the File Catalogue Grid services (RLS team will not yield easily !); This is a huge enterprise, we should rely on using one of the available systems 9 Workload management (1) • The present production service is OK for the simulation production tasks ; • We need more: – Data Reprocessing in production (planned); – User analysis (spurious); – Flexible policies: • Quotas; • Accounting; – Flexible job optimizations (splitting, input prefetching, output merging, etc) ; – Flexible job preparation (UI) tools ; – Various job monitors (web portals, GANGA plugins, report generators, etc); – Job interactivity; – … 10 Workload management (2) • • Possibilities to choose from: 1. Develop the existing service ; 2. Use another existing service ; 3. Start developing the new one. Suggestion – a mixture of all of choices: – – Start developing the new workload management service using existing agents based infrastructure and borrowing some ideas from the AliEn workload management: • • • Already started actually (V. Garonne); First prototype expected next week; Will also try OGSI wrapper for it (Ian Stokes-Rees); Keep the existing service as jobs provider for the new one. 11 Workload management architecture GANGA Workload management Job Receiver Optimizer 11 Optimizer Optimizer 1 Production service Command line UI Job queue Job DB Match Maker Site Agent 1 CE 1 Agent 2 CE 2 Agent 3 CE 3 12 Workload management (3) • Technology: – – – – – – – JDL job description; Condor Classad library for matchmaking; MySQL for Job DB and Job Queues; SOAP (OGSI) external interface; SOAP and/or Jabber internal interfaces; Python as development language; Linux as deployment platform. • Dependencies: – File catalog and Data management tools: • Input/Output sandboxes; – CE • DIRAC CE ; • EDG CE wrapper. 13 Conclusions • Most experiment dependant services are to be developed within the DIRAC project: – MetaCatalog (Job Metadata Catalog); – Workload management (with experiment specific policies and optimizations); – Can be eventually our contribution to the common pool of services. • Get other services from the emerging Grid services market: – Security/Authentication/Authorization, FileCatalog, DataMgmt, SE, CE, Information,… • Aim at having DC2004 done with the new (ARDA) services based architecture – Should be ready for deployment January 2004 14