Download WLCG-ops-T0-Barcelona

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

TV Everywhere wikipedia, lookup

Airborne Networking wikipedia, lookup

Zero-configuration networking wikipedia, lookup

Service-oriented architecture implementation framework wikipedia, lookup

Transcript
T0 report
WLCG operations Workshop
Barcelona, 07/07/2014
Maite Barroso, CERN IT
CERN IT Department
CH-1211 Geneva 23
Switzerland
www.cern.ch/it
Outline
•
•
•
•
•
•
•
Facilities
Next Linux version
Network
Cloud
Grid and batch services
Databases
Summary
2
Facilities
• Wigner (Budapest)
– Additional capacity installed: mainly for openstack, and for
EOS, plus some for business continuity for DB services
– Wigner participated for the first time in the last HEPiX
workshop
– Network to Wigner
• Extensive testing done on the Geant 100 Gbps to identify the
source of the flaps observed
• all segments of the link have been tested without errors, still
source of the problem not identified
• It is possible that cleaning of the fibres ahead of the tests have
resolved the problem. If not, then only an incompatibility between
the Brocade and Alcatel equipment remains as a possible cause.
3
Linux: next version
• Plan: Adopt CentOS 7
– adding CERN specific setup via addon repositories
http://cern.ch/linux/docs/Hepix-Spring-2014 Next Linux
version at CERN.pdf
• CentOS 7 is approaching release
– within few weeks http://seven.centos.org/
• We expect to have a CERN customized test
installation available in July/August
• CERN own version certification ?
– Is it still necessary ?
– To be discussed with Linux Certification Committee
4
Network (1)
• LHCONE
– Increased CERN LHCONE bandwidth to 30Gbps (was
20Gbps)
– working on the definition of a LHCONE AUP that can
guarantee enough security while being doable in reality
– Organization of LHCONE Asian workshop
(https://indico.cern.ch/event/318813/) is on going. It aims to
expand LHCONE connectivity to sites in Asia.
• LHCOPN
– Connected KI and JINR sites of the Russian Tier1s. They have
two 10G links to CERN, one via Amsterdam and one via
Wigner.
– Bandwidth to US Tier1s will increase with the upcoming
deployment of the ESnet PoP at CERN
5
Network (2)
• IPv6
– From the network point of view, IPv6 deployment is
finished
– IT services are becoming dual stack. Right now:
•
•
•
•
email (smtp, imap, pop, owa)
lxplus-ipv6
Ldap
web redirection
– HEPiX IPv6 WG testing of IPv6 compliance of WLCG
applications taking advantages of the deployment of IPv6
at CERN
– CERN, KIT, PIC, NDGF, IN2P3 have IPV6 connectivity
over the LHCOPN
6
Cloud (1)
• All components now run latest Havana-3 release
– Planning the upgrade to Icehouse
• Continues to grow
– Today: 2800 servers, 7000 VMs, 150 TB Volumes
• Work in progress:
– Commissioning resources in Wigner for experiments
• Until now: only batch service
– SSO, Kerberos integration, accounting with Ceilometer
– Adding hardware
• Aim: 6000 compute nodes this year
7
Cloud (2)
8
Cloud (3)
• VM provisioning
9
Services (1)
• VOMRS to VOMS-admin migration
– ATLAS, ALICE, CMS and LHCb still run VOMRS
• We need the new release to migrate this VOs as they need to
sync with the CERN HR DB and in the current version this doesn't
work
• Expected mid-July
– voms-admin in production for the rest of the VOs (test,
ops, geant4, ...)
• LFC
– Decommissioned for Atlas early June, all data is kept for
the moment
– In contact with LHCb about the expected end date of their
need for an LFC service
• FTS: Agreed to stop FTS2 on August 1st
10
Services (2)
Batch:
• SLC6 migration: SLC5 CEs decommissioned, no
grid job submission to SLC5
– SLC5 WNs final migration ongoing
• Batch system migration, from LSF to HTCondor
– Goals: scalability, dynamism, dispatch rate, query scaling
– Replacement candidates:
• SLURM feels too young
• HTCondor mature and promising
• Son of Grid Engine fast, a bit rough
– More details of selection process:
https://indico.cern.ch/event/247864/session/5/contribution
/22/material/slides/0.pdf
11
Services (3)
• Batch system migration, from LSF to HTCondor
– Setting up pilot, will open to experiments
• Start with 10 nodes, plus CREAM CE for Condor, for grid
submissions
• Work is ongoing on integrating AFS token granting and extension
– Full capacity test in parallel, ~5000 nodes
– Close contact with developers
• New Squid service:
– request from Atlas for a more generic Squid service
covering their needs in view of Frontier as well as the
already covered CVMFS needs
• Implementation will be an extension of existing service, different
alias, same instance
12
Databases (1)
• Oracle version upgrade
– Majority of DB services upgraded to 11.2.0.4 (including half
of the Tier1 sites)
– Few DB services upgraded to 12.1.0.1 (LHCb offline,
ATLARC, COMPASS, LANDB, …)
– End of 11.2 support in January 2018; looking at moving to
12c gradually
• HW and Storage evolution
– New HW installation RAC50 in BARN, migration of
production services completed by May
– New HW installations being prepared: RAC51 in BARN and
Wigner (for Disaster Recovery)
– New generation of storage from NetApp
• Integration with the Agile Infrastructure @CERN
13
Databases (2)
• Replication evolution
– Replication Technology Evolution Workshop took in June 3rd-4th
– Replication tests T0 to T1 using production data on-going
– Plan to migrate from Streams to Golden Gate agreed with
experiments and Tier0
• Database as a Service evolution (DBoD)
– New HW and Storage installations
– SW upgrades: MySQL (migrating to 5.6) and Oracle (migrating to
12c multi-tenancy)
– PostgreSQL (version 9.2) since September 2013
More details: Evolution of Database Services today at 17:10
14
Summary
• Getting experience with recent changes:
– Wigner
– Cloud VM provisioning
– IPv6
• And preparing the next ones:
– Quattor phase out
– Next Linux version
– HTCondor
• In a continuous feedback loop with the experiments
and WLCG
15