Download keep strong points, work on weaker ones - LHCb Computing

Grid services based architectures Some buzz words, sorry… • Growing consensus that Grid services is the right concept for building the computing grids; • Recent ARDA work has provoked quite a lot of interest: – In experiments; – SC2 GTA group; – EGEE • Personal opinion - this the right concept that arrived at the right moment: – Experiments need practical systems; – EDG is not capable to provide one; – Need for pragmatic, scalable solution without having to start from scratch. 1 Tentative ARDA architecture 1: Job Provenance Information Service Auditing 2: Authentication 3: API Authorisation User Interface 6: 4: Accounting Metadata Catalogue DB Proxy 14: 5: 13: File Catalogue 7: 10: Workload Management 9: Package Manager Data Management 11: Discuss more these parts in the following 15: Storage Element Grid Monitoring 12: 8: Computing Element Job Monitor 2 Metadata catalogue (Bookkeeping database) (1) LHCb Bookkeeping:  Very flexible schema ;  Storing objects (jobs, qualities, others ?) ;  Available as a services (XML-RPC interface) ;  Basic schema is not efficient for generic queries:  need to build predefined views (nightly ?) ;  views fit the query by the web form, what about generic queries ?  data is not available immediately after production ; Needs further development, thinking, searching … 3 Metadata catalogue (Bookkeeping database) (2) Possible evolution: keep strong points, work on weaker ones • Introduce hierarchical structure – HEPCAL recommendation for eventual DMC ; – AliEn experience ; • Better study sharing parameters between the job and file objects ; • Other possible ideas. This is a critical area – worth investigation ! But… (see next slide) 4 Metadata catalogue (Bookkeeping database) (3) • Man power problems : – For development, but also for maintenance ; • We need the best possible solution : – Evaluate other solutions: • AliEn, DMC eventually ; • Contribute to the DMC development ; – LHCb Bookkeeping as a standard service : • Replaceable if necessary ; • Fair test of other solutions. 5 Metadata catalogue (Bookkeeping database) (4) • Some work has started in Marseille: – AliEn FileCatalogue installed and populated by the information from the LHCb Bookkeeping ; – Some query efficiencies measurements done ; – The results are not yet conclusive : • Clearly fast if the search in the hierarchy of directories; • Not so fast if more tags are included in the query; • Very poor machine used in CPPM – not fair to compare to the CERN Oracle server. – The work started to provide single interface to both AliEn FileCatalogue and LHCb Bookkeeping ; • How to continue: – CERN group – possibilities to contribute ? – CPPM group will continue to follow this line, but resources are limited ; – Collaboration with other projects is essential; 6 File Catalogue (Replica database) (1) • The LHCb Bookkeeping was not conceived with the replica management in mind – added later ; • File Catalog needed for many purposes: – – – – – Data; Software distribution; Temporary files (job logs, stdout, stderr, etc); Input/Output sandboxes ; Etc, etc • Absolutely necessary for DC2004; • File Catalog must provide controlled access to its data (private group, user directories) ; In fact we need a full analogue of a distributed file system 7 File Catalogue (Replica database) (2) • We should look around for possible solutions : – Existing ones (AliEn, RLS) : • Will have Grid services wrapping soon ; • Will comply with the ARDA architecture eventually; • Large development teams behind (RLS, EGEE ?) • This should be coupled with the whole range of the data management tools: – Browsers; – Data transfers, both scheduled and on demand; – I/O API (POOL, user interface). This is a huge enterprise, and we should rely on using one of the available systems 8 File Catalogue (Replica database) (3) • Suggestion is to start with the deployment of the AliEn FileCatalogue and data management tools: – Partly done; – Pythonify the AliEn API: • This will allow developing GANGA and other application plugins; • Should be easy as the C++ API (almost) exist. – Should be interfaced with the DIRAC workload management (see below); – Who ? CPPM group, others are very welcome; – Where ? Install the server at CERN. • Follow the evolution of the File Catalogue Grid services (RLS team will not yield easily !); This is a huge enterprise, we should rely on using one of the available systems 9 Workload management (1) • The present production service is OK for the simulation production tasks ; • We need more: – Data Reprocessing in production (planned); – User analysis (spurious); – Flexible policies: • Quotas; • Accounting; – Flexible job optimizations (splitting, input prefetching, output merging, etc) ; – Flexible job preparation (UI) tools ; – Various job monitors (web portals, GANGA plugins, report generators, etc); – Job interactivity; – … 10 Workload management (2) • • Possibilities to choose from: 1. Develop the existing service ; 2. Use another existing service ; 3. Start developing the new one. Suggestion – a mixture of all of choices: – – Start developing the new workload management service using existing agents based infrastructure and borrowing some ideas from the AliEn workload management: • • • Already started actually (V. Garonne); First prototype expected next week; Will also try OGSI wrapper for it (Ian Stokes-Rees); Keep the existing service as jobs provider for the new one. 11 Workload management architecture GANGA Workload management Job Receiver Optimizer 11 Optimizer Optimizer 1 Production service Command line UI Job queue Job DB Match Maker Site Agent 1 CE 1 Agent 2 CE 2 Agent 3 CE 3 12 Workload management (3) • Technology: – – – – – – – JDL job description; Condor Classad library for matchmaking; MySQL for Job DB and Job Queues; SOAP (OGSI) external interface; SOAP and/or Jabber internal interfaces; Python as development language; Linux as deployment platform. • Dependencies: – File catalog and Data management tools: • Input/Output sandboxes; – CE • DIRAC CE ; • EDG CE wrapper. 13 Conclusions • Most experiment dependant services are to be developed within the DIRAC project: – MetaCatalog (Job Metadata Catalog); – Workload management (with experiment specific policies and optimizations); – Can be eventually our contribution to the common pool of services. • Get other services from the emerging Grid services market: – Security/Authentication/Authorization, FileCatalog, DataMgmt, SE, CE, Information,… • Aim at having DC2004 done with the new (ARDA) services based architecture – Should be ready for deployment January 2004 14

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download keep strong points, work on weaker ones - LHCb Computing