Download WLCG-ops-T0-Barcelona

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

TV Everywhere wikipedia , lookup

Airborne Networking wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Service-oriented architecture implementation framework wikipedia , lookup

Transcript
T0 report
WLCG operations Workshop
Barcelona, 07/07/2014
Maite Barroso, CERN IT
CERN IT Department
CH-1211 Geneva 23
Switzerland
www.cern.ch/it
Outline
•
•
•
•
•
•
•
Facilities
Next Linux version
Network
Cloud
Grid and batch services
Databases
Summary
2
Facilities
• Wigner (Budapest)
– Additional capacity installed: mainly for openstack, and for
EOS, plus some for business continuity for DB services
– Wigner participated for the first time in the last HEPiX
workshop
– Network to Wigner
• Extensive testing done on the Geant 100 Gbps to identify the
source of the flaps observed
• all segments of the link have been tested without errors, still
source of the problem not identified
• It is possible that cleaning of the fibres ahead of the tests have
resolved the problem. If not, then only an incompatibility between
the Brocade and Alcatel equipment remains as a possible cause.
3
Linux: next version
• Plan: Adopt CentOS 7
– adding CERN specific setup via addon repositories
http://cern.ch/linux/docs/Hepix-Spring-2014 Next Linux
version at CERN.pdf
• CentOS 7 is approaching release
– within few weeks http://seven.centos.org/
• We expect to have a CERN customized test
installation available in July/August
• CERN own version certification ?
– Is it still necessary ?
– To be discussed with Linux Certification Committee
4
Network (1)
• LHCONE
– Increased CERN LHCONE bandwidth to 30Gbps (was
20Gbps)
– working on the definition of a LHCONE AUP that can
guarantee enough security while being doable in reality
– Organization of LHCONE Asian workshop
(https://indico.cern.ch/event/318813/) is on going. It aims to
expand LHCONE connectivity to sites in Asia.
• LHCOPN
– Connected KI and JINR sites of the Russian Tier1s. They have
two 10G links to CERN, one via Amsterdam and one via
Wigner.
– Bandwidth to US Tier1s will increase with the upcoming
deployment of the ESnet PoP at CERN
5
Network (2)
• IPv6
– From the network point of view, IPv6 deployment is
finished
– IT services are becoming dual stack. Right now:
•
•
•
•
email (smtp, imap, pop, owa)
lxplus-ipv6
Ldap
web redirection
– HEPiX IPv6 WG testing of IPv6 compliance of WLCG
applications taking advantages of the deployment of IPv6
at CERN
– CERN, KIT, PIC, NDGF, IN2P3 have IPV6 connectivity
over the LHCOPN
6
Cloud (1)
• All components now run latest Havana-3 release
– Planning the upgrade to Icehouse
• Continues to grow
– Today: 2800 servers, 7000 VMs, 150 TB Volumes
• Work in progress:
– Commissioning resources in Wigner for experiments
• Until now: only batch service
– SSO, Kerberos integration, accounting with Ceilometer
– Adding hardware
• Aim: 6000 compute nodes this year
7
Cloud (2)
8
Cloud (3)
• VM provisioning
9
Services (1)
• VOMRS to VOMS-admin migration
– ATLAS, ALICE, CMS and LHCb still run VOMRS
• We need the new release to migrate this VOs as they need to
sync with the CERN HR DB and in the current version this doesn't
work
• Expected mid-July
– voms-admin in production for the rest of the VOs (test,
ops, geant4, ...)
• LFC
– Decommissioned for Atlas early June, all data is kept for
the moment
– In contact with LHCb about the expected end date of their
need for an LFC service
• FTS: Agreed to stop FTS2 on August 1st
10
Services (2)
Batch:
• SLC6 migration: SLC5 CEs decommissioned, no
grid job submission to SLC5
– SLC5 WNs final migration ongoing
• Batch system migration, from LSF to HTCondor
– Goals: scalability, dynamism, dispatch rate, query scaling
– Replacement candidates:
• SLURM feels too young
• HTCondor mature and promising
• Son of Grid Engine fast, a bit rough
– More details of selection process:
https://indico.cern.ch/event/247864/session/5/contribution
/22/material/slides/0.pdf
11
Services (3)
• Batch system migration, from LSF to HTCondor
– Setting up pilot, will open to experiments
• Start with 10 nodes, plus CREAM CE for Condor, for grid
submissions
• Work is ongoing on integrating AFS token granting and extension
– Full capacity test in parallel, ~5000 nodes
– Close contact with developers
• New Squid service:
– request from Atlas for a more generic Squid service
covering their needs in view of Frontier as well as the
already covered CVMFS needs
• Implementation will be an extension of existing service, different
alias, same instance
12
Databases (1)
• Oracle version upgrade
– Majority of DB services upgraded to 11.2.0.4 (including half
of the Tier1 sites)
– Few DB services upgraded to 12.1.0.1 (LHCb offline,
ATLARC, COMPASS, LANDB, …)
– End of 11.2 support in January 2018; looking at moving to
12c gradually
• HW and Storage evolution
– New HW installation RAC50 in BARN, migration of
production services completed by May
– New HW installations being prepared: RAC51 in BARN and
Wigner (for Disaster Recovery)
– New generation of storage from NetApp
• Integration with the Agile Infrastructure @CERN
13
Databases (2)
• Replication evolution
– Replication Technology Evolution Workshop took in June 3rd-4th
– Replication tests T0 to T1 using production data on-going
– Plan to migrate from Streams to Golden Gate agreed with
experiments and Tier0
• Database as a Service evolution (DBoD)
– New HW and Storage installations
– SW upgrades: MySQL (migrating to 5.6) and Oracle (migrating to
12c multi-tenancy)
– PostgreSQL (version 9.2) since September 2013
More details: Evolution of Database Services today at 17:10
14
Summary
• Getting experience with recent changes:
– Wigner
– Cloud VM provisioning
– IPv6
• And preparing the next ones:
– Quattor phase out
– Next Linux version
– HTCondor
• In a continuous feedback loop with the experiments
and WLCG
15