Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Network Monitoring, WAN Performance Analysis, & Data Circuit Support at Fermilab Phil DeMar US-CMS Tier-3 Meeting Fermilab October 23, 2008 Active Wide-Area Network Monitoring PerfSONAR: distributed network monitoring infrastructure Supported by US-LHC T1 sites and Internet2 community PerfSONAR-PS: Active monitoring package Web services collection built on trusted monitoring tools: • ping, BWCTL(iperf), owamp, NPAD, NDT toolkit • Web service interface for pulling data into other monitoring tools Zero configuration; out of box deployment • Based on Knoppix Live CD bootable disk • Optional software bundle deployment Modest hardware requirements for on-site deployment PerfSONAR Deployment Status US-Atlas moving ahead with perfSonar-PS at T1 & T2s: Two dedicated systems per site; one each for latency & b/w testing Systems are spec’ed devices, $628 each (Koi computer) Utilize Knoppix disks & standard configurations We’ve recommended the same model for US-CMS Current PerfSONAR-PS deployment: Both US-LHC Tier-1s (FNAL & BNL) UNL (CMS), U-Mich (ATLAS); U-Delaware; Internet-2; ESnet Complete active monitoring matrix of the above Background information PerfSONAR-PS project http://code.google.com/p/perfsonar-ps/ Tour of perfSONAR-PS service is available http://code.google.com/p/perfsonar-ps/wiki/CodeTour Knoppix Live CD bootable disk info http://code.google.com/p/perfsonar-ps/wiki/NPToolkit Appliance PCs: Vendor: Spec: Cost: KOI Computing – (630) 627-8811 1U Intel Pentium Dual-Core E2200 2.2GHz System $628/each Performance Analysis Support In 1999, Matt Mathis coined the term ‘Wizard’s Gap’ Users often don’t know about: Common OS tuning issues for WAN data movement Wide-area network path, its characteristics, available tools Its still an end-to-end problem Today, it’s still an issue And the world is still short on wizards Our structured analysis methodology seeks to put some of the wizardry into structured process Find the performance problem area(s) Network Application Performance Factors !!! End System 1 2 MEM CPU Applications 3 4 Disks Operating System 5 NIC 7 1’ • • • • • CPU speed MEM Size System Load Disk I/O Speed Operating System • R/W buffer size • Disk cache size • NIC Speed MEM CPU Disks 8 Network 3’ Operating System 5’ NIC Router LAN Network Applications 4’ 6 R/S 2’ R/S 6’ 7’ Cable 9 WAN Router • Network Delay • Bandwidth • Packet Drop Rate Performance Analysis Methodology Structured approach to performance analysis Model the process like medical diagnosis Collect the physical characteristics Run diagnostic tests Record everything; develop a history of the analysis Strategic approach: Sub-divide problem space: • • • Application-related problems Host diagnosis and tuning Network path analysis Then divide and conquer Network Performance Analysis Architecture NESDS ` NES NPDS NES NES LAN R/S NESDS End-to-end Path BR NPDS PTDS ` LAN 7’ 8 Router WAN BR WAN NPDS Network Path Diagnosis Server NPDS PTDS Packet Trace Diagnosis Server NESDS Network End System Diagnosis Server PTDS Cable 9 Network End System Router BR Border Router Performance Analysis Tools… Host diagnosis Network path diagnosis Script that pulls system configuration Network Diagnostic Tool (NDT) • Faulty network connections & NICs, duplex mismatches OWAMP to collect and diagnose one-way network path statistics. • Packet loss, latency, jitter Other tools such as ping, traceroute, as needed Packet trace diagnosis Port mirror on border router(s) Tcpdump to collect packet traces Tcptrace to analyze packet traces Xplot for visual examination. Network path characteristics collected Round-trip time Sequence of routers along the paths One-way delay, delay variance One-way packet drop rate Packet reordering Network Performance Analysis Methodology Step 1: Definition of the problem space Step 2: Collect host information & network path characteristics Step 3: Host tuning & diagnosis Step 4: Network path performance analysis Route changes frequently? Network congestion: delay variance large? Infrastructure failures: examine the counter one by one Packet reordering: load balancing? Parallel processing? Step 5: Evaluate packet trace pattern Tier2/Tier3 Sites worked with UERJ (Brazil) IHEP (China) RAL (UK) University of Florida IFCA (Spain) TTU (Texas) CIEMAT (Spain) Belgium OWEA (Austria) CSCS (Swiss) Performance Analysis Status & Summary An available service for CMS Tier-2/3 sites A work-in-progress at this point Focus is on process as well as results Willing to work with others in this area Future areas of effort: Incorporate into work flow & content management system Make use of perfSonar monitoring infrastructure https://plone3.fnal.gov/P0/WAN/netperf/methodology/ How to get hold of us: Send email to [email protected] Wide Area Work Group video-conf meetings every other Friday Strategic Direction Toward Circuits DOE High Performance Network Planning Workshop established a strategic model to follow: High bandwidth backbones for reliable production IP service • Separate high-bandwidth network paths for large scale science data flows • ESnet Science Data Network Metropolitan Area Networks (MAN) for local access • Fermi LightPath a cornerstone for Chicago area MAN ESnet4: Core networks 50-60 Gbps by 2009-2010 (10Gb/s circuits) Canada Canada Asia-Pacific Asia Pacific (CANARIE) (CANARIE) GLORIAD CERN (30+ Gbps) CERN (30+ Gbps) Europe (Russia and China) (GEANT) Science Data Network Core Australia Boston IP Core Boise New York Denver Washington DC Australia Tulsa LA Albuquerque South America San Diego (AMPATH) South America (AMPATH) Jacksonville Production IP core (10Gbps) SDN core (20-30-40-50 Gbps) MANs (20-60 Gbps) or backbone loops for site access International connections Topology of circuit connections Circuits utilize MAN infrastructure: Circuits based on end-to-end vLANs 10GE channel(s) reserved for routed IP service (purple) LHCOPN circuit (orange) to CERN SDN channels for E2E circuits to CMS Tier-2/3 (shades of green) Direct BGP peering with remote site Multiple provider domains is the norm Deployed technology varies by domains involved Complexity is higher than IP service FNAL Alternate Path Circuits Supported since 2004 Serve a wide spectrum of experiments Implemented on multiple technologies CMS Tier-2s are heavy users But based on end-to-end layer-2 paths Usefulness has varied E2E Circuit Summary FNAL currently supporting E2E circuits to Tier0 & Tier2s A few Tier3s Today, circuits are largely static configurations Dynamic circuit services are becoming available Driven largely by Internet2 DCN services Alternate path support services also emerging Lambda Station (FNAL) TeraPaths (BNL) Contact [email protected] for help or information