Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IEEE 802.1aq wikipedia , lookup
Wake-on-LAN wikipedia , lookup
Airborne Networking wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Parallel port wikipedia , lookup
Zero-configuration networking wikipedia , lookup
Power over Ethernet wikipedia , lookup
Brno University of Technology CESNET z.s.p.o University Campus Network Monitoring in Everyday Life Tomáš Podermański, [email protected] Brno University of Technology • • • • • • • http://www.vutbr.cz One of the largest universities in the Czech Republic founded in 1899, 110th anniversary will be celebrated this year 20,000 students and 2,000 employees 9 faculties 6 other organisation units Student dormitory for 6,000 students VUT FP, FEKT, Kolejní 4 VUT Koleje, Kolejní 2 VUT FCH, FEKT, Purkyňova 118 VUT Koleje, Mánesova 12 VUT FEKT, Technická 8 VUT FIT, Božetechova 2 VUT FSI, Technická 2 AV VFU, Palackého 1/3 VUT TI, Technická 4 VUT Koleje, Purk. MU CESNET , Botanická 68a AV ČR UPT MZLU, Tauferova VUT, Kounicova 67a VUT Koleje , Kounicova 46/48 AV ČR UFM VUT Rektorát, Antonínská 1 VUT FAST, Veveří 95 VUT FaVU, Údolní 19 VUT , Gorkého 13 VUT FEKT Údolní 53 VUT FA, Poříčí 5 MU, Vinařská 5 VUT FaVU, Rybářská 13 AV ČR, Rybářská 13 Physical Layer • • • • 24 places connected to each other Each place is connected at least from two directions (by separated cables) Over 100 km of optical cables Most of the cables are the property of the university IPv4 layer • • • • The network cores are based on Hewlett Packard OSPF based routing For multicast PIM SM and DM are used. Most of the traffic is being transported thought this network IPv6 layer • • • • • IPv6 functionality on HP devices available as beta release Temporary solution based on 3com devices or PC routers with Xorp. Dedicated IPv6 switch/router together with the main IPv4 switch/router. For connections between IPv6 routers VLANs are used. Temporary low cost solution until main devices will have full IPv6 support Basic monitoring, active vs. passive Active Tomography Passive • Active monitoring Statistical NetFlow Service availability Enviroment Network condition Service contidion (response, …) • Passive monitoring Power status Cooling systems Interfaces status Log processing • We sent a probe data and get a response • A probe of the device, network etc. • Observer of the device, network etc. Components in a Monitoring System Components in monitoring system Agent and protocol • • • • • SNMP agent • Get, Set, Walk, Traps NetFlow, SFlow, IPFIX probe • Accumulated statistics For many systems specialized protocol based on the main system Role of a cache on the agent Active monitoring • We use an appropriate protocol or data depending on a monitored service • Proxy service (view from the other point) Components in Monitoring System Manager & Frontend • • • • • Manager collects and proceses data from agents Store and archive in datastore • SQL, RRD, … User interface • Web, application • Reports, SLA, … • Configuration • Historical view System of alerts • Email, SMS, phone call The most popular systems • Zabbix, Nagios, OpenView, nfsen/dump, flowtools, rrdtool, mrtg, cacti, munin, … Quiz What causes the most of troubles in IT? – Power supply of systems • • • • Overloaded circuits Non managed UPS Mess in eletricity instalations Improper power supply could be a booby trap – Cooling systems • • • • Absence of a preventive monitoring Frozen units Jam by foliage … Physical infrastructure LAYER 0,1 Power Supply with 1 + 1 Redundancy PDU I PDU II ATS UPS I 2x 16A UPS II Power Supply with 1 + 1 Redundancy PDU I PDU II Load, voltage Load, voltage on source 1, voltage on source 2, Selected source ATS UPS II UPS I 2x 16A Load, Input voltage, output voltage, battery status power system with 1 + 1 redundancy ATS UPS 2x 16A power system with 1 + 1 redundancy Load, current Input voltage, output voltage, battery status ATS UPS 2x 16A Load, current voltage on source 1, voltage on source 2, Selected source power system with 1 + 1 redundancy Overloaded circuit tripped circuit breaker UPS 2x 16A ATS power system with 1 + 1 redundancy When the power goes up again... in a few minutes UPS is low ATS UPS 2x 16A Second circuit is overloaded tripped circuit breaker Cooling Systems LonWorks Unit status/SNMP Monitoring system Temperatue/SNMP • In many cases a cooling system is a part of the building. • Majority of cooling systems are difficult to monitor. • Some devices have a support, but it costs a lot of money. – In many cases monitoring is more expensive than the cooling device. – There is no standard interface (RS485 with a closed protocol). – Some devices have a binary output which indicates both error and running state (via relay) • Possible conversion to SNMP • Another and the easiest solution -> monitoring of temperature in a communication room. • Thermometer with a SNMP output. Monitoring in Data Center Rooms • • • • • More complex eletrical installation Having UPS and ATS in every rack is ineffective Devices with a 3-phase power Circuits are divided to 3 groups (direct, genset, UPS) More detailed information about the eletricity distribution is very useful. • It is necessary to monitor whether phases are balanced – Genset could break down Power in Data Center Rooms Main power A Devices in racks V ATS Genset V A A Bypass HVAC A UPS V temperature in datacenter temperature in datacenter Server Monitoring SNMP IPMI Monitoring system Other • Hardware – Manufacturers’ software support is required (Dell OpenManage, HP InsightControl, …) – Chassis temperature – Fan condition – Power status • Operating system – CPU, Load, Memory, Utilization, process • Disk subsystem – External disk array with own management port – Raid status – Disk condition (S.M.A.R.T.) Network Device Monitoring *Spd Mode ProCurve ProCurve Switch 4208vl-72GS J9030A off = 10Mbps Self Test flash = 100Mbps on = 1000Mbps Status Reset 1 Clear Console Auxiliary Port Fan 2 A B C D E Power F G H Act FDx Spd Use vl modules only ! LED Mode Select Modules Power 10/100/1000Base-T Ports - all ports are IEEE Auto MDI/MDI-X 10/100/1000Base-T Ports - all ports are IEEE Auto MDI/MDI-X Fault Use ProCurve mini-GBICs and SFPs only 1 ProCurve 24p Gig-T vl Module J8768A 3 5 7 9 11 13 15 17 19 21 A 23 vl 2 4 6 8 10 12 14 16 18 20 22 24 Module ProCurve Gig-T/SFP vl Module J9033A 1 3 5 7 9 11 13 15 17 19 21 23 2 4 6 8 10 12 14 16 18 20 22 24 B vl Module 10/100/1000Base-T Ports - all ports are IEEE Auto MDI/MDI-X ProCurve 24p Gig-T vl Module J8768A 1 3 5 7 9 11 13 15 17 19 21 23 2 4 6 8 10 12 14 16 18 20 22 24 C D SNMP vl Module E F G H • Hardware – Chassis temperature – Fan condition – Power status • State of the operating system – CPU – Load – Memory Monitoring system Network Connection – L1 Monitoring *Spd Mode ProCurve ProCurve Switch 4208vl-72GS J9030A off = 10Mbps Self Test flash = 100Mbps on = 1000Mbps Status Reset 1 Clear Console Auxiliary Port Fan 2 A B C D E Power F G H Act FDx Spd Use vl modules only ! LED Mode Select Modules Power 10/100/1000Base-T Ports - all ports are IEEE Auto MDI/MDI-X 10/100/1000Base-T Ports - all ports are IEEE Auto MDI/MDI-X Fault Use ProCurve mini-GBICs and SFPs only 1 ProCurve 24p Gig-T vl Module J8768A 3 5 7 9 11 13 15 17 19 21 A 23 vl 2 4 6 8 18 20 22 24 1 3 5 7 10 9 12 11 14 13 16 15 17 19 21 23 2 4 6 8 10 12 14 16 18 20 22 24 Module 1 ProCurve Gig-T/SFP vl Module J9033A 3 5 7 9 11 13 15 17 19 21 23 B vl 2 4 6 8 10 12 14 16 18 20 22 24 Module 10/100/1000Base-T Ports - all ports are IEEE Auto MDI/MDI-X C D *Spd Mode ProCurve ProCurve 24p Gig-T vl Module J8768A ProCurve Switch 4208vl-72GS J9030A off = 10Mbps Self Test flash = 100Mbps on = 1000Mbps Status Reset vl 1 Clear Console Module Auxiliary Port Fan 2 A B C D E Power F G H Act FDx Spd Use vl modules only ! LED Mode Select Modules Power 10/100/1000Base-T Ports - all ports are IEEE Auto MDI/MDI-X 10/100/1000Base-T Ports - all ports are IEEE Auto MDI/MDI-X Fault Use ProCurve mini-GBICs and SFPs only 1 E 3 5 7 9 11 13 15 17 19 21 A 23 1 3 5 7 9 11 13 15 17 19 21 23 B F ProCurve 24p Gig-T vl Module J8768A vl 2 4 6 8 18 20 22 24 1 3 5 7 10 9 12 11 14 13 16 15 17 19 21 23 2 4 6 8 10 12 14 16 18 20 22 24 Module ProCurve Gig-T/SFP vl Module J9033A vl 2 4 6 8 10 12 14 16 18 20 22 24 Module 10/100/1000Base-T Ports - all ports are IEEE Auto MDI/MDI-X G D H ProCurve 24p Gig-T vl Module J8768A vl • Port status – – – – C Link UP/DOWN Speed Errors on interfaces Traffic on interfaces • Remote device status – LLDP + data from MIB – Remote interface, remote device, … Module E F G H Link LAYER 2 Network Connection – L2 Monitoring • L2 monitoring – L2 ping could be very useful – We have to use information obtained from other layers (L1,L3) – Unfortunately, there is no simple possibility to check connectivity on a single VLAN – One option is to obtain some information from MIB, but it’s not sufficient • SPT/MSPT information, root bridge • VLAN on interfaces Network Connection – L3 monitoring 147.229.6.1 147.229.6.2 • L3 monitoring Data – ICMP and PING are still the most important – The problem is how to monitor broken paths (routing protocol usually covers any problem) • Check of the routing protocol state • ICMP using the source routing – Flow based monitoring – Multicast monitoring Network Connection – L3 monitoring Master BDR DR • L3 monitoring Backup – Checking the a router having the proper neighbor – OSPF-MIB RFC-4750 • ospfNbrRtrId – VRRP-MIB RFC-2787 • vrrpOperAdminState, vrrpOperState, vrrpOperMasterIpAddr Multicast Monitoring • Quite demanding task – For each stream the <S,G> path has to be created – Continuously received and transmitted stream doesn’t have to discover problem on the RP – Almost impossible to monitor local infrastructure • The only one known tool – Multicast Beacon – Written in perl – Dead project • Last release 2006 • Without VLAN support or support for multiple interfaces on a single host • Homepage unavailable • Own solution : mcwatch Multicast Agents Data is periodically sent to a server Multicast Agent VLAN POSIX SOCKET APPLICATION Multicast Beacon Multicast Agent VLAN POSIX SOCKET APPLICATION mcwatch NetFlow Monitoring CESNET PoP CRS-1/16 University network 10G Ethernet • Two NetFlow probes see on both external connectivity lines • NetFlow probes connected directly to optical fiber via TAP • Wire speed accelerated probes (FlowMon). Flow Processing Nfcapd All administrators Datastore aggregated SQL Backbone administrator • Two NetFlow probes see on both external connectivity lines • NetFlow probes connected directly to optical fiber via TAP • Wire speed accelerated probes (FlowMon). Flow Processing Data are stored on a storage server – Data are kept for 30 days – Analysis of security incidents, statistical proposes – Big deal – how to get/select useful data and provide them to people who need them. – Security matter – Full data are accessible only for small and trustful group of administrators – For other IT staff (faculty administrators, IT managers) summarised data are accessible via a web interface. • Data are processed by common open source tools: – – – – nfdump A lot of troubles, but we don’t have any better solution We are trying to do any optimalisation into the current impelentations Several theses on this topic is in process • Commercial tools - situation is not better – Usually plenty of nice charts and statistics – But performance is often terrible (sampling is required) Transport, application and the others LAYER 4-7 Layer 7 • Many own plugins – – – – – Eduroam/radius monitoring DNS Database status Backup server status …. • Collected data and avilable for administrators on different level – – – – Eduroam/Radius logs Maillogs (DNSBL, spam clasification, statistics) WiFi/VPN connections …. Components in the Monitoring System zabbix SNMP Zabbix Spinel SNMP xwho, xhis radius mysql icmp snmp wifilogs radiuslogs honeypots incidents … netflow nfdump millogs NetIs xmon Radius, DNS Other services zabbix xwho, xhis NetIs nfdump ICMP tests using source routing option OSPF, VRRP peers Multicast traffic monitoring SNMP, zabbix, NetFlow, radius, ICMP, ICMPv6, Spinel, … Physical Link Port statistics, link status, number of errors LLDP neighbour Application Power, Cooling systems, Temperature Server and disk arrays Network devices Internet Monitoring : Layers & Technology Actuall problems • SNMP protocol – No alternative – Many bugs in various implementations • Absence of the L2 testing tool • Netflow – We have plenty of the data but nobody knows how to process it in the effective way – In some cases the more detailed information is required than Flow • IPv6 brings some new problems and challenges Brno University of Technology CESNET z.s.p.o University Campus Network Monitoring in Everyday Life Tomáš Podermański, [email protected]