Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Energy Sciences Network BESAC August 2004 Mary Anne Scott Program Manager Advanced Scientific Computing Research Office of Science Department of Energy William E. Johnston, ESnet Dept. Head and Senior Scientist R. P. Singh, Federal Project Manager Michael S. Collins, Stan Kluz, Joseph Burrescia, and James V. Gagliardi, ESnet Leads Gizella Kapus, Resource Manager and the ESnet Team Lawrence Berkeley National Laboratory 1 What is ESnet? • Mission: • • Vision: • • Provide, interoperable, effective and reliable communications infrastructure and leading-edge network services that support missions of the Department of Energy, especially the Office of Science Provide seamless and ubiquitous access, via shared collaborative information and computational environments, to the facilities, data, and colleagues needed to accomplish their goals. Role: • A component of the Office of Science infrastructure critical to the success of its research programs (program funded through ASCR/MICS; managed and operated by ESnet staff at LBNL). 2 Why is ESnet important? • Enables thousands of DOE, university and industry scientists and collaborators worldwide to make effective use of unique DOE research facilities and computing resources independent of time and geographic location o o o • Direct connections to all major DOE sites Access to the global Internet (managing 150,000 routes at 10 commercial peering points) User demand has grown by a factor of more than 10,000 since its inception in the mid 1990’s—a 100 percent increase every year since 1990 Capabilities not available through commercial networks - Architected to move huge amounts of data between a small number of sites - High bandwidth peering to provide access to US, European, AsiaPacific, and other research and education networks. Objective: Support scientific research by providing seamless and ubiquitous access to the facilities, data, and colleagues 3 How is ESnet Managed? • A community endeavor o Strategic guidance from the OSC programs - Energy Science Network Steering Committee (ESSC) – BES represented by Nestor Zaluzec, ANL and Jeff Nichols, ORNL o Network operation is a shared activity with the community - ESnet Site Coordinators Committee - Ensures the right operational “sociology” for success • Complex and specialized – both in the network engineering and the network management – in order to provide its services to the laboratories in an integrated support environment • Extremely reliable in several dimensions Taken together these points make ESnet a unique facility supporting DOE science that is quite different from a commercial ISP or University network 4 …what now??? VISION - A scalable, secure, integrated network environment for ultra-scale distributed science is being developed to make it possible to combine resources and expertise to address complex questions that no single institution could manage alone. • Network Strategy Production network - Base TCP/IP services; +99.9% reliable High-impact network - Increments of 10 Gbps; switched lambdas (other solutions); 99% reliable Research network - Interfaces with production, high-impact and other research networks; start electronic and advance towards optical switching; very flexible [UltraScience Net] • Revisit governance model o o SC-wide coordination Advisory Committee involvement 5 Where do you come in? • Early identification of requirements o o • • • Evolving programs New facilities Participation in management activities Interaction with BES representatives on ESSC Next ESSC meeting on Oct 13-15 in DC area 6 What Does ESnet Provide? • A network connecting DOE Labs and their collaborators that is critical to the future process of science • An architecture tailored to accommodate DOE’s large-scale science o move huge amounts of data between a small number of sites • High bandwidth access to DOE’s primary science collaborators: Research and Education institutions in the US, Europe, Asia Pacific, and elsewhere • • Full access to the global Internet for DOE Labs • Grid middleware and collaboration services supporting collaborative science Comprehensive user support, including “owning” all trouble tickets involving ESnet users (including problems at the far end of an ESnet connection) until they are resolved – 24x7 coverage o trust, persistence, and science oriented policy 7 What is ESnet Today? • • ESnet builds a comprehensive IP network infrastructure (routing, IPv6, and IP multicast) on commercial circuits o ESnet purchases telecommunications services ranging from T1 (1 Mb/s) to OC192 SONET (10 Gb/s) and uses these to connect core routers and sites to form the ESnet IP network o ESnet purchases peering access to commercial networks to provide full Internet connectivity Essentially all of the national data traffic supporting US science is carried by two networks – ESnet and Internet-2 / Abilene (which plays a similar role for the university community) 8 How Do Networks Work? • Accessing a service, Grid or otherwise, such as a Web server, FTP server, etc., from a client computer and client application (e.g. a Web browser_ involves o Target host names o Host addresses o Service identification o Routing 9 How Do Networks Work? LBNL router core router router ESnet (Core network) border router gateway router core router DNS border/gateway routers •implement separate site and network provider policy (including site firewall policy) peering peering router core routers •focus on highspeed packet forwarding peering routers Exchange reachability information (“routes”) • implement/enforce routing policy for each provider • provide cyberdefense router Big ISP (e.g. SprintLink) router router router router router router Google, Inc. 10 ESnet Core is a High-Speed Optical Network ESnet site site LAN Site IP router ESnet hub RTR ESnet IP router RTR 10GE • usually SONET data framing or Ethernet data framing • can be clear digital channels (no framing – e.g. for digital HDTV) RTR 10GE Lambda channels are converted to electrical channels ESnet core Site – ESnet network policy demarcation (“DMZ”) Wave division multiplexing • today typically 64 x 10 Gb/s optical channels per fiber • channels (referred to as “lambdas”) are usually used in bi-directional pairs A ring topology network is inherently reliable – all single point failures are mitigated by routing traffic in the other direction around the ring. RTR RTR optical fiber ring RTR 11 ESnet Provides Full Internet Service to DOE Facilities and Collaborators with High-Speed Access to all Major Science Collaborators CA*net4 KDDI (Japan) France Switzerland Taiwan (TANet2) Australia CA*net4 Taiwan (TANet2) Singaren CA*net4 MREN Netherlands Russia StarTap Taiwan (ASCC) LIGO PNNL GEANT - Germany - France - Italy - UK - etc Sinet (Japan) Japan – Russia(BINP) CERN ESnet IP Japan MIT JGI LBNL NERSC SLAC FNAL ANL-DC INEEL-DC ORAU-DC ANL LLNL/LANL-DC SNLL QWEST ATM LLNL AMES BNL NY-NAP PPPL MAE-E 4xLAB-DC GTN&NNSA MAE-W PAIX-E KCP YUCCA MT JLAB ORNL LANL SDSC ALB HUB 42 end user sites GA Office Of Science Sponsored (22) NNSA Sponsored (12) Joint Sponsored (3) Other Sponsored (NSF LIGO, NOAA) Laboratory Sponsored (6) peering points hubs high-speed peering points OSTI ARM SNLA ORAU NOAA SRS Allied Signal ESnet core: Packet over SONET Optical Ring and Hubs International (high speed) OC192 (10G/s optical) OC48 (2.5 Gb/s optical) Gigabit Ethernet (1 Gb/s) OC12 ATM (622 Mb/s) OC12 OC3 (155 Mb/s) T3 (45 Mb/s) T1-T3 T1 (1 Mb/s) ESnet’s Peering Infrastructure Connects the DOE Community With its Collaborators CA*net4 Australia CA*net4 Taiwan (TANet2) Singaren PNW-GPOP CERN GEANT - Germany - France - Italy - UK - etc SInet (Japan) KEK Japan – Russia (BINP) MREN Netherlands Russia StarTap Taiwan (ASCC) KDDI (Japan) France SEA HUB 2 PEERS Distributed 6TAP 19 Peers Abilene Japan 1 PEER LBNL CalREN2 1 PEER Abilene + 7 Universities Abilene 2 PEERS PAIX-W 3 PEERS FIX-W MAE-W 39 PEERS CENIC SDSC NYC HUBS 5 PEERS 26 PEERS MAX GPOP MAE-E PAIX-E 22 PEERS 20 PEERS EQX-SJ GA ESnet Peering (connections to other networks) 6 PEERS LANL TECHnet University International Commercial Abilene ATL HUB ESnet provides access to all of the Internet by managing the full complement of Global Internet routes (about 150,000) at 10 general/commercial peering points + high-speed peerings w/ Abilene and the international R&E networks. This is a lot of work, and is very visible, but provides full access for DOE. What is Peering? • • Peering points exchange routing information that says “which packets I can get closer to their destination” ESnet daily peering report (top 20 of about 100) • This is a lot of work peering with this outfit is not random, it carries routes that ESnet needs (e.g. to the Russian Backbone Net) AS routes peer 1239 63384 SPRINTLINK 701 51685 UUNETALTERNET 209 47063 QWEST 3356 41440 LEVEL3 3561 35980 CABLEWIRELESS 7018 28728 ATT-WORLDNET 2914 19723 VERIO 3549 17369 GLOBALCENTER 5511 8190 OPENTRANSIT 174 5492 COGENTCO 6461 5032 ABOVENET 7473 4429 SINGTEL 3491 3529 CAIS 11537 3327 ABILENE 5400 3321 BT 4323 2774 TWTELECOM 4200 2475 ALERON 6395 2408 BROADWING 2828 2383 XO 7132 1961 SBC 14 What is Peering? • Why so many routes? So that when I want to get to someplace out of the ordinary, I can get there. For example: http://www-sbras.nsc.ru/eng/sbras/copan/microel_main.html (Technological Design Institute of Applied Microelectronics, Novosibirsk, Russia) Peering routers Start: 134.55.209.5 snv-lbl-oc48.es.net ESnet core 134.55.209.90 snvrt1-ge0-snvcr1.es.net ESnet peering at Sunnyvale 63.218.6.65 pos3-0.cr01.sjo01.pccwbtn.net AS3491 CAIS Internet 63.218.6.38 pos5-1.cr01.chc01.pccwbtn.net “ “ 63.216.0.53 pos6-1.cr01.vna01.pccwbtn.net “ “ 63.216.0.30 pos5-3.cr02.nyc02.pccwbtn.net “ “ 63.218.12.37 pos6-0.cr01.ldn01.pccwbtn.net “ “ 63.218.13.134 rbnet.pos4-1.cr01.ldn01.pccwbtn.net AS3491->AS5568 (Russian Backbone Network) peering point 195.209.14.29 MSK-M9-RBNet-5.RBNet.ru Russian Backbone Network 195.209.14.153 MSK-M9-RBNet-1.RBNet.ru “ “ 195.209.14.206 NSK-RBNet-2.RBNet.ru “ “ Finish: 194.226.160.10 Novosibirsk-NSC-RBNet.nsc.ru RBN to AS 5387 (NSCNET-2) 15 Predictive Drivers for the Evolution of ESnet August 13-15, 2002 Organized by Office of Science Mary Anne Scott, Chair Dave Bader Steve Eckstrand Marvin Frazier Dale Koelling Vicky White Workshop Panel Chairs The network is needed for: o long term (final stage) data analysis o “control loop” data analysis (influence an experiment in progress) o distributed, multidisciplinary simulation Ray Bair and Deb Agarwal Bill Johnston and Mike Wilde Rick Stevens Ian Foster and Dennis Gannon Linda Winkler and Brian Tierney Sandy Merola and Charlie Catlett •The network and middleware requirements to support DOE science were developed by the OSC science community representing major DOE science disciplines o o o o Climate Spallation Neutron Source Macromolecular Crystallography High Energy Physics o o o Magnetic Fusion Energy Sciences Chemical Sciences Bioinformatics Available at www.es.net/#research 16 The Analysis was Driven by the Evolving Process of Science Feature Discipline analysis was driven by Vision for the Future Process of Science Characteristics that Motivate High Speed Nets • A few data repositories, many Analysis of model data distributed computing sites Climate by selected communities that have • NCAR - 20 TBy (near term) high speed networking • NERSC - 40 TBy (e.g. NCAR and NERSC) • ORNL - 40 TBy Requirements Networking Middleware • Server side data • Authenticated data streams for easier site access through firewalls processing (computing and cache embedded in the net) • Information servers for global data catalogues • Add many simulation elements/components as understanding increases Climate (5 yr) Enable the analysis of • Robust access to model data by all of the • 100 TBy / 100 yr generated simulation data, 1-5 PBy / yr (just at large quantities of collaborating data NCAR) community o Distribute large chunks of data to major users for postsimulation analysis • 5-10 PBy/yr (at NCAR) • Add many diverse simulation Climate (5-10 yr) • Robust networks supporting distributed Integrated climate elements/components, including simulation simulation that from other disciplines - this must be adequate bandwidth includes all high-impact done with distributed, and latency for factors multidisciplinary simulation remote analysis and • Virtualized data to reduce storage visualization of load massive datasets • Reliable data/file transfer (across system / network failures) • Quality of service guarantees for distributed, simulations • Virtual data catalogues and work planners for reconstituting the data on demand 17 Evolving Quantitative Science Requirements for Networks Science Areas Today End2End Throughput 5 years End2End Throughput 5-10 Years End2End Throughput Remarks High Energy Physics 0.5 Gb/s 100 Gb/s 1000 Gb/s high bulk throughput Climate (Data & Computation) 0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s high bulk throughput SNS NanoScience Not yet started 1 Gb/s 1000 Gb/s + QoS for control channel remote control and time critical throughput Fusion Energy 0.066 Gb/s (500 MB/s burst) 0.198 Gb/s (500MB/ 20 sec. burst) N x 1000 Gb/s time critical throughput Astrophysics 0.013 Gb/s (1 TBy/week) N*N multicast 1000 Gb/s computational steering and collaborations Genomics Data & Computation 0.091 Gb/s (1 TBy/day) 100s of users 1000 Gb/s + QoS for control channel high throughput and steering 18 Observed Drivers for ESnet Evolution • Are we seeing the predictions of two years ago come true? • Yes! 19 OSC Traffic Increases by 1.9-2.0 X Annually ESnet is currently transporting about 250 terabytes/mo. (250,000,000 MBy/mo.) 300 ESnet Monthly Accepted Traffic TBytes/Month 250 200 150 100 Annual growth in the past five years has increased from 1.7x annually to just over 2.0x annually. 50 2004 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 0 20 ESnet is Engineered to Move a Lot of Data 1 Terabyte/day ESnet Top 20 Data Flows, 24 hr. avg., 2004-04-20 A small number of science users account for a significant fraction of all ESnet traffic Since BaBar data analysis started, the top 20 ESnet flows have consistently accounted for > 50% of ESnet’s monthly total traffic (~130 of 250 TBy/mo) 21 The traffic is not transient: Daily and weekly averages are about the same. SLAC is a prototype for what will happen when Climate, Fusion, SNS, Astrophysics, etc., start to ramp up the next generation science ESnet Top 10 Data Flows, 1 week avg., 2004-07-01 ESnet is a Critical Element of Large-Scale Science • ESnet is a critical part of the large-scale science infrastructure of high energy physics experiments, climate modeling, magnetic fusion experiments, astrophysics data analysis, etc. • As other large-scale facilities – such as SNS – turn on, this will be true across DOE 23 Science Mission Critical Infrastructure • ESnet is a visible and critical piece of general DOE science infrastructure o • if ESnet fails, tens of thousands of DOE and University users know it within minutes if not seconds Requires high reliability and high operational security in the o network operations, and o ESnet infrastructure support – the systems that support the operation and management of the network and services - Secure and redundant mail and Web systems are central to the operation and security of ESnet – trouble tickets are by email – engineering communication by email – engineering database interface is via Web - Secure network access to Hub equipment - Backup secure telephony access to all routers - 24x7 help desk (joint w/ NERSC) and 24x7 on-call network engineers 24 Automated, real-time monitoring of traffic levels and operating state of some 4400 network entities is the primary network operational and diagnosis tool Performance Hardware Configuration Network Configuration SecureNet OSPF Metrics (routing and connectivity) IBGP Mesh (routing and connectivity) ESnet’s Physical Infrastructure Picture detail Equipment rack detail at NYC Hub, 32 Avenue of the Americas (one of ESnet’s core optical ring sites) 26 Typical Equipment of an ESnet Core Network Hub Sentry power 48v 30/60 amp panel ($3900 list) Sentry power 48v 10/25 amp panel ($3350 list) DC / AC Converter ($2200 list) Lightwave Secure Terminal Server ($4800 list) Juniper M20 AOA-PR1 (peering RTR) ($353,000 list) Qwest DS3 DCX AOA Performance Tester ($4800 list) Cisco 7206 AOA-AR1 (low speed links to MIT & PPPL) ($38,150 list) ESnet core equipment @ Qwest 32 AofA HUB NYC, NY (~$1.8M, list) Juniper OC192 Optical Ring Interface (the AOA end of the OC192 to CHI ($195,000 list) Juniper T320 AOA-CR1 (Core router) ($1,133,000 list) Juniper OC48 Optical Ring Interface (the AOA end of the OC48 to DC-HUB ($65,000 list) 27 Disaster Recovery and Stability LBNL SNV HUB Remote Engineer • partial duplicate infrastructure Engineers, 24x7 Network Operations Center, generator backed power • Spectrum (net mgmt system) • DNS (name – IP address translation) • Eng database • Load database • Config database • Public and private Web • E-mail (server and archive) • PKI cert. repository and revocation lists • collaboratory authorization ALB HUB service Remote Engineer • partial duplicate infrastructure DNS AMES BNL CHI HUB NYC HUBS PPPL DC HUB Remote Engineer Duplicate Infrastructure Currently deploying full replication of the NOC databases and servers and Science Services databases in the NYC Qwest carrier hub • The network must be kept available even if, e.g., the West Coast is disabled by a massive earthquake, etc. Reliable operation of the network involves • remote NOCs • replicated support infrastructure • generator backed UPS power at all critical network and infrastructure locations • non-interruptible core - ESnet core operated without interruption through o o o N. Calif. Power blackout of 2000 the 9/11/2001 attacks, and the Sept., 2003 NE States power blackout 28 ESnet WAN Security and Cyberattack Defense • Cyber defense is a new dimension of ESnet security o o Security is now inherently a global problem As the entity with a global view of the network, ESnet has an important role in overall security 30 minutes after the Sapphire/Slammer worm was released, 75,000 hosts running Microsoft's SQL Server (port 1434) were infected. (“The Spread of the Sapphire/Slammer Worm,” David Moore (CAIDA & UCSD CSE), Vern Paxson (ICIR & LBNL), Stefan Savage (UCSD CSE), Colleen Shannon (CAIDA), Stuart Staniford (Silicon Defense), Nicholas Weaver (Silicon Defense & UC Berkeley EECS) http://www.cs.berkeley.edu/~nweaver/sapphire ) Jan., 2003 29 ESnet and Cyberattack Defense Sapphire/Slammer worm infection hits creating almost a full Gb/s (1000 megabit/sec.) traffic spike on the ESnet backbone 30 Cyberattack Defense ESnet first response – filters to assist a site ESnet second response – filter traffic from outside of ESnet ESnet third response – shut down the main peering paths and provide only limited bandwidth paths for specific “lifeline” services X X router ESnet peering router router LBNL X Lab first response – filter incoming traffic at their ESnet gateway router gateway router border router attack traffic router peering router border router Lab Sapphire/Slammer worm infection created a Gb/s of traffic on the ESnet core until filters were put in place (both into and out of sites) to damp it out. Lab gateway router 31 Science Services: Support for Shared, Collaborative Science Environments • X.509 identity certificates and Public Key Infrastructure provides the basis of secure, crosssite authentication of people and systems (www.doegrids.org) o ESnet negotiates the cross-site, cross-organization, and international trust relationships to provide policies that are tailored to collaborative science in order to permit sharing computing and data resources, and other Grid services o Certification Authority (CA) issues certificates after validating request against policy o This service was the basis of the first routine sharing of HEP computing resources between US and Europe Science Services: Public Key Infrastructure * Report as of July 15,2004 33 Voice, Video, and Data Tele-Collaboration Service • Another highly successful ESnet Science Service is the audio, video, and data teleconferencing service to support human collaboration o Seamless voice, video, and data teleconferencing is important for geographically dispersed scientific collaborators o ESnet currently provides to more than a thousand DOE researchers and collaborators worldwide - H.323 (IP) videoconferences (4000 port hours per month and rising) - audio conferencing (2500 port hours per month) (constant) - data conferencing (150 port hours per month) - Web-based, automated registration and scheduling for all of these services • Huge cost savings for the Labs 34 ESnet’s Evolution over the Next 10-20 Years • Upgrading ESnet to accommodate the anticipated increase from the current 100%/yr traffic growth to 300%/yr over the next 5-10 years is priority number 7 out of 20 in DOE’s “Facilities for the Future of Science – A Twenty Year Outlook” • Based on the requirements of the OSC Network Workshops, ESnet must address o Capable, scalable, and reliable production IP networking - University and international collaborator connectivity - Scalable, reliable, and high bandwidth site connectivity o Network support of high-impact science - provisioned circuits with guaranteed quality of service (e.g. dedicated bandwidth) o Science Services to support Grids, collaboratories, etc 35 New ESnet Architecture to Accommodate OSC • The future requirements cannot be met with the current, telecom provided, hub and spoke architecture of ESnet New York (AOA) DOE sites ESnet Core Washington, DC (DC) Sunnyvale (SNV) El Paso (ELP) • Atlanta (ATL) The core ring has good capacity and resiliency against single point failures, but the point-to-point tail circuits are neither reliable nor scalable to the required bandwidth 36 S C 1-40 Gb/s, end-to-end I C C S S storage C compute • In the near term applications I instrument cache & compute need higher bandwidth 3-5 yr Requirements S C guaranteed bandwidth paths I C S C • high bandwidth • QoS S C I C C 100-200 Gb/s, S end-to-end • high bandwidth and QoS • network resident cache and compute elements 2-4 yr Requirements S 4-7 yr Requirements 1-3 yr Requirements Evolving Requirements for DOE Science Network Infrastructure C I C S C • high bandwidth and QoS • network resident cache and compute elements • robust bandwidth (multiple paths) 37 A New Architecture • With the current architecture ESnet cannot address o the increasing reliability requirements o the long-term bandwidth needs (incrementally increasing tail circuit bandwidth is too expensive – it will not scale to what OSC needs) - LHC will need dedicated 10 Gb/s into and out of FNAL and BNL • ESnet can benefit from o Engaging the research and education networking community for advanced technology o Leveraging the R&E community investment in fiber and networks 38 A New Architecture • ESnet new architecture goals: full redundant connectivity for every site and high-speed access for every site (at least 10 Gb/s) • Three part strategy 1) MAN rings provide dual site connectivity and much higher site-to-core bandwidth 2) A second core will provide - multiply connected MAN rings for protection against hub failure - extra core capacity - a platform for provisioned, guaranteed bandwidth circuits - alternate path for production IP traffic - carrier neutral hubs 3) a high-reliability IP core (like the current ESnet core) 39 A New ESnet Architecture AsiaPacific Europe 2nd Core (e.g. NLR) Sunnyvale (SNV) Metropolitan Area Rings New York (AOA) ESnet Existing Core Atlanta (ATL) Existing hubs New hubs Washington, DC (DC) El Paso (ELP) DOE/OSC Labs Possible new hubs 40 ESnet Beyond FY07 AsiaPac SEA CERN Europe Europe Japan Japan CHI SNV NYC DEN DC Japan ALB ATL SDG MANs Qwest – ESnet hubs ELP NLR – ESnet hubs High-speed cross connects with Internet2/Abilene Major DOE Office of Science Sites Production IP ESnet core High-impact science core Lab supplied Major international 2.5 Gbs 10 Gbs 10Gb/s 30Bg/s Future phases 40Gb/s 41 Conclusions • ESnet is an infrastructure that is critical to DOE’s science mission • Focused on the Office of Science Labs, but serves many other parts of DOE • ESnet is working hard to meet the current and future networking need of DOE mission science in several ways: o Evolving a new high speed, high reliability, leveraged architecture o Championing several new initiatives which will keep ESnet’s contributions relevant to the needs of our community 42 Reference -- Planning Workshops • High Performance Network Planning Workshop, August 2002 http://www.doecollaboratory.org/meetings/hpnpw • DOE Workshop on Ultra High-Speed Transport Protocols and Network Provisioning for Large-Scale Science Applications, April 2003 http://www.csm.ornl.gov/ghpn/wk2003 • Science Case for Large Scale Simulation, June 2003 http://www.pnl.gov/scales/ • DOE Science Networking Roadmap Meeting, June 2003 http://www.es.net/hypertext/welcome/pr/Roadmap/index.html • Workshop on the Road Map for the Revitalization of High End Computing, June 2003 http://www.cra.org/Activities/workshops/nitrd http://www.sc.doe.gov/ascr/20040510_hecrtf.pdf (public report) • ASCR Strategic Planning Workshop, July 2003 http://www.fp-mcs.anl.gov/ascr-july03spw • Planning Workshops-Office of Science Data-Management Strategy, March & May 2004 o http://www-conf.slac.stanford.edu/dmw2004 (report coming soon) 43