Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LCG CONDITIONS DATABASE COOL - PERFORMANCE TESTS A. Valassi, L. Canali (CERN IT-PSS, Geneva, Switzerland), M. Clemencic (CERN PH-LBC, Geneva, Switzerland / LHCb), D. Front (Weizmann Institute, Israel, and CERN IT-PSS, Geneva, Switzerland), U. Moosbrugger, S. A. Schmidt (University of Mainz, Germany / Atlas), S. Stonjek (University of Oxford, UK / Atlas) COOL Performance in Atlas Prompt Reconstruction (Simplified Scenario) Atlas requirements (1 – processing) • One reconstruction job every 5s – – To process all events previously taken during a 5s interval Conditions for all 5s are fetched together when job starts • Events processed in the order they were taken – – Hence IOVs are read sequentially from the COOL database To be confirmed: maybe ‘almost’ in the order they were taken? • Reconstruction farm of 100 nodes – – 10 processes per node (1k simultaneous processes in total) Hence time available for one reconstruction job is 5000s Atlas requirements (2 – conditions) • 100 MB of conditions to process one event – Atlas ‘snapshot’ description at any given validity time • Sustained throughput: 20 MB/s and 20k rows/s – 1000 bytes per channel (typically, numbers and strings) • Sustained I/O: 300 kB/s and 300 rows/s – Hence one condition changes every 3ms (5min/100k) – – – 100 COOL folders with 1k channels in each folder NB: actual requirement was 1k folders with 100 channels 100 relational tables to be separately queried • 100k channels with uncorrelated validities • Typical IOV duration is 5 minutes • 100 different payload schemas Simulated conditions for the test Test setup Hence: COOL performance requirements Result: SUCCESS! – Total throughput from the database server to all clients – – – May exploit server data cache if conditions are fetched in order Only 1/60 of data in [5s,10s] are different from those in [0s,5s] Retrieve 5s chunks of conditions with 5min IOV duration Test database covers 5 hours (61 IOVs) 70% CPU on each of 10 clients (10 processes per client) Sustained 12 + 9 kRows/s … at least for this (how realistic?) tested scenario… Client side response Sustained data influx 2 MB/s Distributed data access tests Work in progress • COOL team and DBAs: effect of data cache Data cache size required to hold all data larger than expected Larger I/O rates than expected (not all in cache?) Is it safe to assume that the cache can be used? Effect of data block size on cache and I/O? – – Original 1k folder requirement more difficult to handle Shared pool latches observed under certain conditions? • COOL team and DBAs: effect of table number • COOL team and DBAs: network data rates – SQL*Net data compression for identical values in different rows? • COOL team: client-side C++ overhead – Less than 10s (out of 70s) are spent on the server CPU • Atlas: confirm/modify detailed requirements – – – – 100k channels (one condition change every 3ms?) 100 vs. 1k folders (1k different schemas/tables?) Events processed in the order they were taken? Can reconstruction jobs process more than 5s event chunks? – Higher values for some parameters may lead to scalability problems (e.g. 1k vs 100 folders) 30 • Access from COOL client in Oxford – – – To Oracle server in Oxford To Oracle server at RAL (near Oxford) To Oracle server at CERN • Main components of client real time: – – – Client user time (COOL C++ data manipulation) Server time (e.g. Oracle server CPU and I/O) Client-server network round-trips • Remote access time is dominated by network round-trip component – – Oxford-CERN 25 real time / user time – – – – 20 15 10 5 Oxford-RAL Oxford-Oxford 0 Roughly proportional to ping latency Use bulk retrieval to minimise #round trips 0 5 10 15 20 ping latency [m s] Relational query improvements Examples of past improvements • Multi-channel bulk retrieval – Single query on each IOV table (with optional channel selection) – Wrong (old) execution plan observed otherwise • Force SQL hard parse when gathering statistics Work in progress • Increasing time for IOV retrieval – – Querying on both ‘since’ and’ until’ is not optimal New ‘max (since)’ query strategy will be implemented – Identified during Atlas prompt reconstruction tests – – Extra ‘channel’ table must be added to the schema Need bulk update/delete functionalities from CORAL • Missing multi-dimensional indices • Multi-channel bulk insertion Andrea Valassi (CERN IT-PSS) CHEP 2006, Mumbai (13 - 17 February 2006)