Download slides - Indico

Database Access Patterns in ATLAS Computing Model G. Gieraltowski, J. Cranshaw, K. Karr, D. Malon, A. Vaniachine ANL P, Nevski, Yu. Smirnov, T. Wenaus BNL N. Barros, L. Goossens, R. Hawkings, A. Nairz, G. Poulard, Yu. Shapiro, F. Zema CERN XV International Conference on Computing in High Energy and Nuclear Physics T.I.F.R., Mumbai, India February 13-17, 2006 CHEP06, Mumbai, India February 13-17, 2006 Outline 1) Emphasis on the early days of LHC running:  Calibration/Alignment is a priority  Must be done before the reconstruction start  ATLAS 2006 Computing System Commissioning:  Calibration/Alignment procedures are included in acceptance tests 2) Real experience in prototypes and production systems  General issues encountered:  Increased fluctuations in database server load  Connections count limitations 3) Development of the ATLAS distributed computing model:  Server-side developments:  Deployment: LCG3D Project and OSG Edge Services Framework Activity  Technology: Grid-enabled server technology - Project DASH  Application-side technology developments:  Deployment: Integration with Production System database (Conditions data slices)  Technology: ATLAS Database Client Library (now adopted by COOL/POOL/CORAL) Alexandre Vaniachine (ANL) 2 CHEP06, Mumbai, India February 13-17, 2006 ATLAS Computing Model  In the ATLAS Computing Model widely distributed applications require access to terabytes of data stored in relational databases  Realistic database services data flow – including Calibration & Alignment – is presented in the Computing Technical Design Report  Preparations are on track towards Computing System Commissioning to exercise realistic database data flow Alexandre Vaniachine (ANL) 3 CHEP06, Mumbai, India February 13-17, 2006 ATLAS CSC Goals  2006 is the year of ATLAS CSC  The first goal of the CSC is calibration and alignment procedures  ConditionsDB is included in CSC acceptance tests WLCG SC4 Workshop - 12 February 2006 Computing System Commissioning Goals    We have defined the high-level goals of the Computing System Commissioning operation during 2006  Formerly called “DC3”  More a running-in of continuous operation than a stand-alone challenge Main aim of Computing System Commissioning will be to test the software and computing infrastructure that we will need at the beginning of 2007:  Calibration and alignment procedures and conditions DB  Full trigger chain  Event reconstruction and data distribution  Distributed access to the data for analysis At the end (autumn-winter 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates Alexandre Vaniachine (ANL) 4 Dario Barberis: ATLAS SC4 Plans 4 CHEP06, Mumbai, India February 13-17, 2006 Towards the Early Days of LHC Running  Calibration/Alignment is a priority  Must be done before the reconstruction start  Calibration/Alignment is a part of the overall Computing System Commissioning activity to:  Demonstrate the calibration ‘closed loop’: Iterate and improve reconstruction  Exercise the conditions DB access and distribution infrastructure  Encourage development of subdetector calibration algorithms  Initially focussed on ‘steady-state’ calibration  Assuming required samples are available and can be selected  Also want to look at initial 2007/2008 running at low luminosity Alexandre Vaniachine (ANL) 5 CHEP06, Mumbai, India February 13-17, 2006 Calibration Data Flow Alexandre Vaniachine (ANL) 6 CHEP06, Mumbai, India February 13-17, 2006 Prerequisites for Success Simulation  Ability to simulate a realistic, misaligned, miscalibrated detector Reconstruction  Use of calibration data in reconstruction; ability to handle timevarying calibration Calibration Algorithms  Algorithms in Athena, running from standard ATLAS data Data Preparation  Organisation and bookkeeping  run number ranges, production system,… Alexandre Vaniachine (ANL) 7 CHEP06, Mumbai, India February 13-17, 2006 Production System Enhancements To prepare for new challenges first ATLAS Database Services Workshop was organized in December: http://agenda.cern.ch/fullAgenda.php?ida=a057425 Among the Workshop recommendations was:  A tighter integration of the production system database, task definition, Distributed Data Management and conditions data tags Implementation opportunities:  Distribute (push) snapshots via pacman  Use of DDM for large payload files  Try Oracle 10g file management for external files  Expand existing ServersCatalog with top tags Alexandre Vaniachine (ANL) 8 CHEP06, Mumbai, India February 13-17, 2006 ATLAS DB Applications  In preparation for data taking, the ATLAS experiment has run a series of large-scale computational exercises to test and validate multi-tier distributed data grid solutions under development  Real experience in prototypes and production systems was collected with three ATLAS major database applications:  Geometry DB  Conditions DB  TAG databases  ATLAS computational exercises run on a world-wide federation of computational grids Alexandre Vaniachine (ANL) 9 CHEP06, Mumbai, India February 13-17, 2006 Data Mining of Operations  The data-mining of the collected operations data reveals a striking feature – a very high degree of correlations between the failures:  if the job submitted to some cluster failed, there is a high probability that a next job submitted to the cluster would fail too  if the submit host failed, all the jobs scattered over different clusters will fail too  Taking these correlations into account is not yet automated by the grid middleware  That is why production databases and grid monitoring data that are providing immediate feedback on the Data Challenge operations to the production operators is very important for efficient utilization of the Grid capacities Alexandre Vaniachine (ANL) 10 CHEP06, Mumbai, India February 13-17, 2006 Production Rate Growth and Daily Fluctuations 14000 12000 10000 Jobs/day Rome Production (mix of jobs) LCG/CondorG LCG/Original NorduGrid Grid3 8000 2005 Database Capacities Bottleneck Data Challenge 2 (short jobs period) 6000 Data Challenge 2 (long jobs period) 4000 2000 0 Alexandre Vaniachine (ANL) Jul Aug Sep Oct Nov Dec 11 Jan Feb Mar Apr May CHEP06, Mumbai, India February 13-17, 2006 Lessons Learned  Among the lessons learned is the increase in fluctuations in database server workloads due to the chaotic nature of grid computations  The observed fluctuations in database access patterns are of a general nature and must be addressed through services enabling dynamic and flexibly managed provisioning of database resources  In many cases the connections count happens to be the limiting resource Alexandre Vaniachine (ANL) 12 CHEP06, Mumbai, India February 13-17, 2006 Opportunistic Grids  Campus computing grids like the GLOW http://osg-docdb.opensciencegrid.org/cgi-bin/ShowDocument?docid=361 utilize spare cycles to run jobs  The priority has the owner of resource  ATLAS jobs are often put to hibernate  Thus optimal jobs are shorter, i.e. only few events  Resulting in order of magnitude more frequent database access  Jobs put to hibernation during the initialization phase overload CERN database resources by keeping database connections open for days  This problem was resolved by deploying dedicated replica servers in US and CERN to support the GLOW grid  In comparison to production grids opportunistic grids require extra development and support efforts  not sustainable in the long run Alexandre Vaniachine (ANL) 13 CHEP06, Mumbai, India February 13-17, 2006 Client Library  To improve robustness of database access in a data grid environment we developed the applicationside solution – a software component abstracting the database and/or middleware connectivity concerns in a generalized Database Client Library http://indico.cern.ch/contributionDisplay.py?contribId=32&sessionId=4&confId=048 Alexandre Vaniachine (ANL) 14 CHEP06, Mumbai, India February 13-17, 2006 Server Indirection  One of lessons learnt in ATLAS Data Challenges is that the database server address should NOT be hardwired in data processing transformations  The logical-physical indirection for database servers is now introduced in ATLAS  Similar to the logical-physical file Replica Location Service indirection of the Grid file catalogs  Supported by ATLAS Client Library  Now adopted by LHC POOL project: http://indico.cern.ch/contributionDisplay.py?contribId=329&sessionId=4&confId=048 Alexandre Vaniachine (ANL) 15 CHEP06, Mumbai, India February 13-17, 2006 Tier-0 Operations CondDBB CTB DBs Online server (atlobk01) DB replication Data acquisition programs NOVA DBs POOL cat CondDB CTB DBs Offline server (atlobk02) Test DBs OBK DBs POOLcat NOVA DBs Browsing applications, Athena programs (Other Browsing applications) OBK DBs Alexandre Vaniachine (ANL)  In addition to distributed operations, ATLAS database services are relevant to local CERN data taking operations including the conditions data flow of ATLAS Combined Test Beam operations, prototype Tier-0 scalability tests and event tag database operations 16 CHEP06, Mumbai, India February 13-17, 2006 TAG Database Access TAG Replication is a part of SC4 Tier-0 test  Loading TAGs into the relational database at CERN  Replicating it using Oracle streams from Tier-0 to Tier-1s and to Tier-2s  Also as an independent test, using TAG files that are already available generated TAG Access • TAG is a keyed list of variables/event • Two roles • Direct access to event in file via pointer • Data collection definition function • Two formats, file and database • Now believe large queries require full database • Restricts it to Tier1s and large Tier2s/CAF • Ordinary Tier2s hold file-based TAG corresponding to locally-held datasets Alexandre Vaniachine (ANL) 17 CHEP06, Mumbai, India February 13-17, 2006 Participation in LCG 3D LCG 3D Service Architecture  ATLAS is fully committed to use Distributed Database Deployment infrastructure developed in collaboration with the LCG 3D Project T0 M O M - autonomous reliable service T1- db back bone - all data replicated - reliable service O T2 - local db cache T3/4 -subset data -only local service M O M Oracle Streams Cross vendor copy MySQL/SQLight Files Proxy Cache Alexandre Vaniachine (ANL) 18 CHEP06, Mumbai, India February 13-17, 2006 Participation in OSG ESF  US ATLAS is participating in OSG Edge Services Framework Activity to enhance traditional database services infrastructure deployed in 3D with dynamic database services deployment capabilities Open Science Grid Edge Services • Services executing on the edge of the public and private network CE CMS ATLAS CDF Guest VO SE Site Compute nodes and Storage nodes SC05 booth presentation OSG Edge Services Framework 1 http://indico.cern.ch/contributionDisplay.py?contribId=214&sessionId=7&confId=048 Alexandre Vaniachine (ANL) 19 CHEP06, Mumbai, India February 13-17, 2006 Project DASH  To grid-enable MySQL database server ATLAS is participating in the project DASH: http://indico.cern.ch/contributionDisplay.py?contribId=36&sessionId=7&confId=048  A new collaborative project has just started at Argonne to grid-enable PostgreSQL database  Both projects target integration with OSGA-DAI  Please contact us if you are interested to contribute to these projects Alexandre Vaniachine (ANL) 20 CHEP06, Mumbai, India February 13-17, 2006 Conclusions  As grid computing technologies mature, development must focus on database and grid integration  New technologies are required to bridge the gap between data accessibility and the increasing power of grid computing used for distributed event production and processing  Changes must happen both on the server side and on the client side Server technology  Must support dynamic deployment of capacities  Must support replication on a lower granularity level: Conditions DB slices  Must be coordinated with production system  Must support grid authorization (Project DASH) Client technology  Must support database server indirection  Must support coordinated client-side solution:  ATLAS Database Client Library (now a part of COOL/POOL/CORAL) Alexandre Vaniachine (ANL) 21

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download slides - Indico