Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
Relational model wikipedia , lookup
Encyclopedia of World Problems and Human Potential wikipedia , lookup
Clusterpoint wikipedia , lookup
CASTOR Databases at RAL Richard Sinclair Database Services Team Lead CASTOR F2F, CERN 28th November 2012 Instances • 4 LHC CASTOR Instances: – ATLAS – CMS – LHCB – GEN • Spread across 2 Database RACs (6 machines in total) Instances (cont.) • STFC ‘Facilities’ Instance: – 2 node RAC • Pre-Production Instance: – 2 node RAC • Certification Instance: – Single Instance DB (to be merged into Pre-Prod) • DLF: – Single Instance Configuration • • • • • • All databases at 11.2.0.3 Dell Nodes (Quad Core), EMC Clarion Arrays All using ASM with raw devices Red Hat Enterprise Linux 5 Monitored via Grid Control LHC CASTOR Databases use Data Guard for resilience • Peaks at around 1000 t/ps Distribution of Services GEN ATLAS LHCB CMS EMC ARRAY Neptune Pluto NS Data Guard • Implemented February 2012 • Physical Standby RACs • Identical Hardware to Primary Databases (Dell Nodes + EMC Array) • Runs in Max Performance Data Guard Mode (not real time updates, lag typically < 1 minute) • Backups taken from standby databases Data Guard (cont.) • Currently Primary and Standby databases in UPS room • In the near future Primary and Standby to be in separate building • Not currently using Fast Start Failover • Part of a bigger plan to minimise risk of building failure for core services..... e.g. power problems ;-) CASTOR Challenges • Occasional performance problems (much less on 11g) – Usually linked to stale statistics – SRM suffers more from performance problems than stager • Introduced SQL Plan Control to limit Oracles ability to choose execution plans – Only currently fixing plans on the ATLAS SRM The Future • Few minor hardware repairs needed (after power surge) • Retirement of DLF Database (2.1.13) • Merging of Pre-Prod and Cert Systems • Re-think of Tape Backups (Cloud?) • Move the standby databases off site • Test automated failover (using ‘observer’ machine) • Move from RHEL to Oracle Linux Questions? ...and hopefully answers