Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Zero-configuration networking wikipedia , lookup
Computer network wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Distributed firewall wikipedia , lookup
Asynchronous Transfer Mode wikipedia , lookup
Airborne Networking wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Deep packet inspection wikipedia , lookup
Network Monitoring in the BaBar Experiment S. Luitz, D. Millsom, D. Salomoni CHEP2000 - Padova, February 2000 Summary The BaBar Data Acquisition Network A Typical Scenario... Traffic Monitoring and Recording Traffic Dump Analysis Tools Real-Time Analysis of Traffic Conclusions and Outlook CHEP2000 - Padova, February 2000 The BaBar Data Acquisition Network (1) ca. 200 VME single board computers (VxWorks): 100 Mbit/s full duplex Ethernet 78 Sun Ultra 5 "farm" workstations for Level-3 trigger and fast monitoring: 2 100 Mbit/s full duplex Ethernet each ("dual homed") 5 Sun Ultra 60 application servers (e.g. Run control): 100 Mbit/s full duplex Ethernet 15 Sun Ultra 5 display console machines: 10 or 100 Mbit/s Ethernet CHEP2000 - Padova, February 2000 The BaBar Data Acquisition Network (2) 1 Sun E 450 (4 CPU, 780 Gbyte RAID) central boot/NFS/database/data buffer server: 2 x 1GBit/s Ethernet various development and user workstations 3 Cisco Cat 5500 switches 2 VLANs / IP subnets: dedicated real-time DAQ network (35-40 MByte/s) general purpose / data transfer network CHEP2000 - Padova, February 2000 CHEP2000 - Padova, February 2000 A Typical Scenario Problem: Shift crew reports: "Run control server problem ca. 45 min ago at 23:50" A look at the system logs shows NFS timeouts at 23:08 but no network-related events (like spanning tree reconfigurations) Central network monitoring shows "normal" traffic What was going on? Did someone/something overload the NFS server? Data base access? ...? Server based performance monitoring very poor ! Wouldn´t it be nice to be able to have a close look at the network traffic around 23:05? CHEP2000 - Padova, February 2000 Traffic Monitoring and Recording (1) We can! Even with free software tools! Configure switch to forward all traffic in the BaBar general-purpose VLAN/subnet to a monitoring port (SPAN) Standard protocol analyzers no good: small buffers, what to trigger on? Sun E 250 with 72 Gbyte disk and Gigabit Ethernet as traffic recorder and protocol analyzer Record packet headers into "circular" disk buffer CHEP2000 - Padova, February 2000 Traffic Monitoring and Recording (2) Use tcpdump (ftp://ftp.ee.lbl.gov) to capture packet headers and write them to files In our environment: We can´t monitor the real-time network, switch backplane capacity could be exceeded at peak We have 3 switches, however presently we only monitor the switch where the file server is connected Typical captured data rates during normal operation: 4 Gbytes / hour CHEP2000 - Padova, February 2000 Analysis Tools (1) How to look at Gigabytes of recorded network data? Use tcpdump to filter dump file (e.g. "host bbr-srv02 and host bbr-srv03 and port nfs") into a smaller file Use tcpslice (ftp://ftp.ee.lbl.gov) to isolate time intervals from the dump files Use tcptrace to automatically analyze TCP connections and plot throughput graphs http://jarok.cs.ohiou.edu/software/tcptrace/tcptrace.html Look at low rate events directly with tcpdump CHEP2000 - Padova, February 2000 Analysis Tools (2) Sample tcptrace output for a connection (NFS) TCP connection 4: host g: BBR-SRV03.SLAC.Stanford.EDU:32769 host h: BBR-SRV02.SLAC.Stanford.EDU:2049 complete conn: yes first packet: Fri Jan 28 23:24:35.019938 2000 last packet: Fri Jan 28 23:24:35.027876 2000 elapsed time: 0:00:00.007938 total packets: 11 filename: srv02srv03.dump g->h: h->g: total packets: 6 total packets: 5 ack pkts sent: 5 ack pkts sent: 5 pure acks sent: 3 pure acks sent: 2 unique bytes sent: 44 unique bytes sent: 28 actual data pkts: 1 actual data pkts: 1 actual data bytes: 44 actual data bytes: 28 xmit time: 0.000 secs idletime max: 4.4 ms idletime max: 4.1 ms throughput: 5543 Bps throughput: 3527 Bps NFS port on server Not much happened! data xmit time: Much more info available, edited to fit ... CHEP2000 - Padova, February 2000 0.000 secs data Analysis Tools (3) Throughput between two hosts Yellow dots: instantaneous rate, quantization due to time resolution of packet time (GBit!) Red line: Averaged rates CHEP2000 - Padova, February 2000 Analysis Tools (4) The network dump can e.g. answer the following questions (and many more): Who (UID,GID) has read the 25 Gbyte data file over NFS? Were NFS timeouts correlated to a high NFS transaction volume/rate? Which hosts were accessing the file server? Do we have hosts/software with configuration problems? (Wrong subnet masks, applications using incorrect subnet broadcast addresses) However, the analysis of the files is complicated, we´d like to have better CHEP2000 -tools! Padova, February 2000 Real-Time Analysis of Traffic A very interesting and promising free tool is NTOP (www.ntop.org) Captures packets, analyzes the protocol headers in real-time and dynamically generates web pages, e.g.: Protocols and their distribution Hosts, host info, data sources and destinations Throughput graphs Traffic matrix Still in development, not perfectly stable yet CHEP2000 - Padova, February 2000 Real-Time Monitoring NTOP example CHEP2000 - Padova, February 2000 Conclusions and Outlook Network traffic recording and analysis is feasible (with some restrictions) even in high performance switched network environments looking forward to the next generation of gigabit-speeds-monitoringcapable switches and workstations has shown to be very helpful in understanding host and network performance problems and computing infrastructure troubleshooting Powerful free software tools are available: but multiple programs, command line based, make analysis of network traffic log files quite a complicated procedure The ultimate tool would be a PAW-like program for networks which allows filtering and plotting with a simple command language CHEP2000 - Padova, February 2000