Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Enabling Grids for E-sciencE Network in EGEE Building end-to-end network services for the Grid Mathieu Goutelle – CNRS UREC, France EGEE-II SA2 “Networking support” [email protected] www.eu-egee.org EGEE-II INFSO-RI-031688 EGEE and gLite are registered trademarks Outline Enabling Grids for E-sciencE • Short presentation of EGEE, • The network in EGEE: – Network services? – EGEE focus on end-to-end services in a multi-domain context. • Network services: – Resource reservation, – Service Level Agreement. • Operational services: – Monitoring, – EGEE Network Operational Centre. • Summary & conclusion EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 2 EGEE in a nutshell… Enabling Grids for E-sciencE • EGEE: – 1 April 2004 – 31 March 2006 – 71 partners in 27 countries, federated in regional Grids • EGEE-II: – 1 April 2006 – 31 March 2008 – 91 partners in 32 countries – 13 Federations • Objectives: – Large-scale, production-quality infrastructure for e-Science – Attracting new resources and users from industry as well as science – Improving and maintaining “gLite” Grid middleware EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 3 EGEE in a nutshell… Enabling Grids for E-sciencE • More than 20 applications from 7 domains: – Astrophysics: MAGIC, Planck – Computational Chemistry – Earth Sciences: Earth Observation, Solid Earth Physics, Hydrology, Climate – Financial Simulation: E-GRID – Fusion – Geophysics: EGEODE – High Energy Physics: 4 LHC experiments (ALICE, ATLAS, CMS, LHCb) BaBar, CDF, DØ, ZEUS – Life Sciences: Bioinformatics (Drug Discovery, GPS@, Xmipp_MLrefine, etc.) Medical imaging (GATE, CDSS, gPTM3D, SiMRI 3D, etc.) – Multimedia – Material Sciences – … EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 4 EGEE Infrastructure Enabling Grids for E-sciencE Country participating in EGEE Scale (June 2006): ~ 200 sites in 40 countries ~ 25 000 CPUs > 10 PB storage > 35 000 jobs per day > 100 Virtual Organizations EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 5 Network infrastructure Enabling Grids for E-sciencE Connects 32 NRENs Over 3M users EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 6 Network infrastructure (cont.) Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 7 End-to-end network services? Enabling Grids for E-sciencE • What type of services? – Network services are available to the EGEE sites: Premium IP and similar (QBSS e.g.), “lightpath” or network resource reservation, IPv6, multicast… – Operational services are available to the EGEE sites: Monitoring of the network (local & backbone), Operational data (incident, maintenance). • How to ensure the service continuity along the path? – In the last mile? – In a multi-domain context? • What about service availability, interface standardization, inter-domain agreements, etc. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 8 EGEE focus Enabling Grids for E-sciencE • Network services: – Network resource reservation: Bandwidth Allocation and Reservation (BAR), Dedicated talk on that subject (see session 1, “End to End Bandwidth Allocation and Reservation for Grid applications”). – Service Level Agreement (SLAs): End-to-end SLAs? • Operational services: – Monitoring: Network Performance Monitoring (NPM), Dedicated talk on that subject (see session 2, “Federated Network Performance Monitoring for the Grid”). – Coordination of operational actions: Concept of the EGEE Network Operational Centre (ENOC). EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 9 Network resource reservation Enabling Grids for E-sciencE • Based on the framework currently being built by the GÉANT2 project: – Hides the multi-domain, multiple technologies issues; – Provides at the Grid level: A seamless interface for service requests at the “customer” layer; High-level view of the network, with request of characteristics and not of a particular service; Reduced configuration lead-time; A description of the service level. • Issues remain: – A component (BAR, see dedicated talk) gives access to these interfaces at the middleware layer, but the application layer is not yet ready; – Need of sub-management of the macroscopic reserved resource at the Grid level; – What about domains outside the GÉANT2 cloud? EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 10 Quick look at the BAR architecture Enabling Grids for E-sciencE Site 1 Network 1 Network 2 Network 3 Site 2 HLM BAR BAR EGEE Network L-NSAP L-Network NSAP NSAP L-NSAP NSAP Extended QoS Network L-Network • Clear demarcation between the Grid and the network: – The network is hidden from the Grid (technology, multi-domain issues…); – The Grid is hidden to the network (only knows one “EGEE” user); – Allows a two-stage process (reservation & activation) suitable in a Grid context; EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 11 SLAs Enabling Grids for E-sciencE • “SLAs”? – Description of the characteristics of the service provided (e.g. after a successful resource reservation request); – Provided by each domain crossed by the data path; – Either manually filled in by a human or automatically if the request is all handled by software. – Definition of templates in cooperation with GÉANT2: Based on previous work inside EGEE and answers from GÉANT2 to some open issues (procedures, demarcation point…) • SLA template: – Administrative part (contact, duration, troubleshooting procedures); – SLS (Service Level Specification) part. • The SLA is formed using the individual SLAs provided by all domains along the end-to-end path. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 12 SLAs (cont.) Enabling Grids for E-sciencE border-to-border connectivity end-to-end connectivity • EGEE end-to-end SLA template: – Concatenation of the individual SLAs in each participating domains; – SLA between the border of the NRENs cloud (border-to-border SLA); • Difficulty to accommodate and take into account the “last mile”: – If the “last-mile” network is not participating (no resource reservation system, no SLA, etc.); – Try to address this with static information on these networks to provide service characteristics to the user/application. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 13 SLA institution Enabling Grids for E-sciencE • All domains involved in network services provisioning to EGEE as part of the existing network infrastructure hierarchy have to be categorized as one of: – Compliant with the Premium IP service, – Supportive of the Premium IP service, – Indifferent to the Premium IP service. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 14 EGEE focus Enabling Grids for E-sciencE • Network services: – Network resource reservation: Bandwidth Allocation and Reservation (BAR), Dedicated talk on that subject (see session 1, “End to End Bandwidth Allocation and Reservation for Grid applications”). – Service Level Agreement (SLAs): End-to-end SLAs? • Operational services: – Monitoring: Network Performance Monitoring (NPM), Dedicated talk on that subject (see session 2, “Federated Network Performance Monitoring for the Grid”). – Operational Interface with the network: Concept of the EGEE Network Operational Centre (ENOC). EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 15 Monitoring Enabling Grids for E-sciencE • Not Yet Another Monitoring Framework! – Role of a Mediator between the various monitoring frameworks and the various clients (diagnostic tools, middleware, etc.); – Network Performance Monitoring (NPM) gives access to data collected at existing monitoring frameworks (site, backbone); – Use of the NMWG interface to access those frameworks and republish data; – Special requirements for some middleware components for faster access to data. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 16 Operational Interface Enabling Grids for E-sciencE • The network infrastructure of EGEE is mainly served by a set of NRENs via GÉANT2; • Need of an entity coordinating all the NOCs involved and the Grid Operations: – Concept of an end-to-end Coordination Unit (GÉANT2); – Providing an end-to-end operational support. • A single point of contact as an operational interface between EGEE and GÉANT2/NRENs dealing with: – – – – Network problems troubleshooting, Interactions with network providers and Grid sites, Notifications from NRENs, Network SLA installation and monitoring. • Two Functional Entities inside EGEE: – EGEE Network Operational Centre (ENOC); – A Network Trouble Ticket Manager – GGUS. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 17 ENOC Enabling Grids for E-sciencE EGEE Network Support Units NRENs GGUS ENOC Users GÉANT2 • From the EGEE point of view: – GGUS acts as the first line support (interacts with the user); – Support units are the second level support; • From the NRENs’ point of view: – EGEE (via the ENOC) is a single entity; – The ENOC is the only point of contact for the NRENs (submitter of the problem). EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 18 ENOC (cont.) Enabling Grids for E-sciencE • Main challenges: – To create a network support structure inside EGEE; – To define the associated network operational procedures. • The ENOC is the user support for network failures: – End-to-End network problems troubleshooting; – Coordination unit of the actions of all the entities involved in a network incident; – Try to have an overall view of the end-to-end service, gathering information from all the involved domains; – SLA Management: installation and monitoring. • ENOC Operational Procedures have been defined and validated during the first phase of EGEE; • EGEE-II will fully implement ENOC. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 19 ENOC (cont.) Enabling Grids for E-sciencE • ENOC Service: – Collect tickets from NRENs which agree to provide them to the ENOC; – Forward to GGUS the ones that seem relevant (possible impact on the Grid infrastructure); – Receive tickets assigned to ENOC by the GGUS 1st level support; – Troubleshoot them with the help of monitoring tools; – Contact identified faulty domains or reassign ticket to the associated site if there is no evidence of a backbone problem (e.g. LAN issue). • Main Issues: – Load on the ENOC team (amount of info, etc.); – Heterogeneity of systems the ENOC has to deal with (languages, trouble ticket format, monitoring, etc.). EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 20 ENOC status Enabling Grids for E-sciencE • ENOC team is ready! 5 people (2 FTE) including one dedicated to it. • ENOC receives operational information from GÉANT2 and 10 NRENs (more to come): About 80% of all the EGEE sites covered; An average of 5 tickets handled per day; 8 different languages. • Building tools to follow up or enhance the network support: Network Operational Database (interconnection of administrative domains between the EGEE resource centres); TT parsing and filtering tool; Dashboard to present overall status of the “EGEE network”. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 21 EGEE expectations Enabling Grids for E-sciencE • Towards a better solution against our “multi-domain” and “end-to-end” issues • Seamless access to network monitoring data: GÉANT2 will provide such access (PerfSonar), from multiple domains, aggregating data from multiple frameworks; • Network resource reservation: Requests expressed not in terms of service but of characteristics; The choice of the underlying technology to fulfil them is up to the network; Answer to a request = SLA (depending of the current network status & load); What about the last mile? The non-NRENs domains? • Standardization of the operational interface: Trouble Ticket format (data schema and exchange format); Access method. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 22 Summary & conclusion Enabling Grids for E-sciencE • Focus on providing end-to-end services in a multidomain context: – Hiding the network complexity from the Grid (users, middleware, Grid support); – Hiding the Grid complexity from the network (single point of contact, operational interface); • Many building blocks depend on the providers: – Resource reservation frameworks, SLA installation, backbone monitoring; – Fortunately, EGEE and GÉANT2 built up a strong collaboration! • Many things remains pending: – Mainly on the operational side (homogenization of the network interface); – How to cope with domains outside the GÉANT2 cloud? • The two infrastructures need to collaborate on these aspects. EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 23 Enabling Grids for E-sciencE Thank you for your attention! EGEE-II INFSO-RI-031688 GridNets 2006 – 2006-10-01 – San Jose, CA, USA 24