Download Introduction

MoMon: Single Ended, Plug-and-Play, and Autonomous Grid Network Monitoring Tool Abstract: Grid Network Monitoring is a critical factor in the optimization of Grid Networks. Several existing network monitoring tools may not scale well to the needs of high-performance Grid network monitoring, while others are dependent upon SNMP access to routers/switches on a global scale. These limitations can be surmounted with the recent advances in active probing tools. In this paper, we present an architecture, implementation and evaluation of MoMon: Mobile Monitoring tool. It targets Grid Network Monitoring which includes End-Host monitoring, end-to-end performance measurement, and Grid Network Topology analysis and its visualization. We use existing active probing, single sided, open source tools for monitoring information gathering, i.e. ping for latency and packet loss, pathload for end-to-end available bandwidth, and pathneck for discovering bottlenecks, choke points, and hops along the path and hop-to-hop available bandwidth. I. INTRODUCTION In the last decade, Grid computing has made huge strides, and virtually revolutionized high performance and high throughput computing for use in scientific applications. In Grids, sites share their resources in order to solve problems collaboratively. Shared resources can be geographically dispersed and in different administrative domains with states changing dynamically. The basic task of Grid computing is to provide shared resources to an entity on demand, transparently of its location. Grid Monitoring is a separate domain in Grid computing. It deals with monitoring the state of shared resources and the entities providing them. One critical factor in Grid Monitoring is Grid Network Monitoring. The drawbacks of Grid infrastructure deprived of Grid Network monitoring will be illustrated by an example. Consider Figure 1, we have four collaborating sites. Say Peer A, wants to replicate a huge amount of data across the Grid. In this scenario, there are three different options available to A i.e. B, C, and D. Peer A knows the CPU and storage requirement of all the destinations but the critical factor in transferring the huge amount of data is the state of the network between A and its destination. In order to transfer the data, Peer A should know the latency, packet loss, TCP Throughput, Bottlenecks and Choke points [1] along the destination path. Based on these parameters, Peer A should transfer its data while utilizing the network efficiently. Fig. 1. Grid Infrastructure Grid Site’s and its Local Area Network (LAN) is also a critical resource for grid. It also needs to be monitored periodically. Grid Middleware is interested in CPU, Memory, Storage and IO states of a grid sites. While Grid Site itself is interested in cognizing the current and predicting future performance of it’s LAN. LAN performance information is required to be cognized in order to locate Grid Site at an appropriate physical location. In this paper, we target Grid Network Monitoring which includes End-Host monitoring, end-to-end performance measurement between grid sites, and Grid Network Topology analysis and its visualization. We use some existing active probing, single sided, open source tools for monitoring information gathering i.e. ping for latency and packet loss, pathneck [1] for discovering bottlenecks, choke points, and hops along the path and hop-to-hop available bandwidth. The monitoring information that is gathered is stored in relational database in order to provide persistency. Such information can be used for network debugging and engineering. A standard interface is provided for Grid middleware and applications that will use such information in job scheduling and data replication. Rest of the paper is organized as follows. Section 2 mention some of the most related work, section 3 describes the MoMon system architecture, section 4 evaluate the performance and accuracy of MoMon, and section 5 describes some ideas for future extension of the system. II. RELATED WORK There are different techniques and implementation available for Grid network Monitoring. The most popular systems that are related to MoMon are mentioned in this section. MonaLISA [8] is a JINI based Dynamic Distributed Service Architecture. It is able to provide monitoring, control and global optimization services for complex systems like Grids. In order to monitor the WAN links like Ultralight [9], it requires SNMP access on each and every single router along the path. It uses tools like pathload [2] for end-to-end available bandwidth and SNMP access on each router to get hop-byhop topological information of the network. Considering the fact, Router can only be accessed through SNMP, within an administrative domain. It will fail if MonaLISA wants to monitor the WAN connectivity that is not managed or having no SNMP access permission to routers. GridICE [12] provides a distributed monitoring tool designed for Grid system. Its network monitoring is based on [13], which provide the monitoring of only end-to-end performance like latency, loss rates and available bandwidth. It doesn’t provide the hop-to-hop WAN topological information neither it provides visualization and simulation of the Grid Site’s Network connectivity. PingER [6] uses ICMP echo message to measure latency and loss rates over 850 hosts around the world. It provides network end-to-end performance between hosts and regions. It also provides long-term trends about many geographical areas of the internet. ANEMOS [7] is an end-to-end active measurement framework for autonomous network monitoring. It is use to find network anomalies, based on user defined rules and patterns. PerfSONAR [16] is an infrastructure of network performance monitoring. It provides the capacity and utilization of each host along the path. Like MonaLISA, it requires an SNMP access on each router along the path. Currently perfSONAR is deployed at ESnet, Abiline, and GEANT router interfaces. It will fail on unmanaged and on autonomous system in which they don’t have an SNMP access. The similarity between above systems and MoMon is that they provide end-to-end performance and some provides topological information regarding the Grid Site’s connectivity with the limitation of SNMP access on router. To overcome the limitation of managed WAN links, pathneck [1] is able to provide the hop-to-hop analysis. One other advantage of MoMon is that it provides the whole analysis using the single sided active probing tools. Consequently, the whole system doesn’t require any thing to be installed or configured on remote sites. III. ARCHITECTURE Grid Network monitoring is a critical aspect of Grid Computing. It doesn’t matter how efficient networks are, they can’t survive without periodic and real time monitoring. The goal of our project is to help and assist grid middleware and applications, and network engineers to debug network for errors. Architecture of MoMon is shown in figure 2. Fig. 2. MoMon: Mobile Monitoring Architecture A. End-Host Monitoring: This module enables user to monitor the end-host by specifying the monitoring parameters of their own interest like CPU, I/O, Memory and storage. The term end-host means, any device that is connected to Network i.e. PCs, router, and switch. Here the issue arises, how to make the monitoring of host fully customizable. We made our tool customizable using SNMP and XML. An XML configuration file is required, containing information such as name, description, type, Object ID, and unit of monitoring parameters. After specifying the configuration file, the tool starts monitoring according the parameters defined. Each monitoring parameter value(s) that is fetched from host MIB after specific interval specified by user. To provide persistency, the XML hierarchy schema is dynamically converted into relational schema and the values are saved in a relational database. Fig. 3. End-Host Monitoring process In order to provide fault tolerance and scalability in terms of added parameters to be monitored, we use multi threaded architecture, in which each parameter is monitored in separate thread. There is a central thread called as Thread Master, which controls the execution of all threads. This module provides a single interface to all other modules. It requires an IP address of a host and an XML configuration file containing the parameters to be monitored. It then starts an autonomous thread in order to monitor the new host received. B. End-to-End Performance Measurement: In End-to-End Performance monitoring we use existing active probing tools like ping for latency and packet loss, pathload[2] for available bandwidth, and pathneck[1] for bottleneck discovery along the path. In our implementation, single measurement task includes multiple measurements after specific interval time in order to make the measurement less network intrusive. At start up, this module receives input destinations addresses, and does independent measurement for each destination with an interval of 10 minutes. In order to provide persistency each measurement task for the specific destination is stored in relational database for network forecasting and debugging. One of the primary features of this module is a Plug-and-Play support. User only requires some minor changes in order to incorporate new tools or replace tools in a measurement task. C. LAN Monitoring: Grid site’s LAN (Local Area Network) is one of a Grid Computing resource. It needs to be monitored periodically in order to evaluate its performance at different intervals of time. LAN monitoring can be simplified, if we know the physical layout i.e. topology of a network. While, it is very difficult to manually discover and evaluate the performance of LAN. The aim of this module is an autonomous discovery of LAN physical layout, evaluating the performance between multiple subnets with in LAN, and simulation of whole private network in graphical format. i. LAN Physical Topology Discovery: Assuming the end machine running this module is managed and all routers in a LAN are also managed. The topology discovery algorithm is given below: 1. 2. 3. 4. 5. 6. 7. 8. topology_discovery ( router ) if ( (routers = SNMP.getNextHops(router) == NULL) then return else foreach nextHop in routers linkList.add ( router, nextHop ) nodeHashTable.add ( nextHop, SNMP.getArpCache(nextHop)) topology_discovery (nextHop) routerList.add(routers) The above recursive algorithm is started by passing the address of local machine on which the module is running. The first next hop will be the default gateway of that machine. The algorithm will go on till it cross the boundaries of a LAN administrative domain. When the nextHop reach the last hop, the getNextHop () function (Step 2) will return null. This is because SNMP access is only allowed to the users with in an administrative domain. If an Autonomous System (AS) is connected to another AS which have a poor security constraints, like users with in other administrative domain can query SNMP their routers. In that scenario, users are required to provide the maximum no. of hops to discover. In order to avoid local host to be managed, user can provide the default gateway manually to over come this limitation. ii. Evaluation of LAN: In order to evaluate the performance of network, we evaluate latency, packet loss and available bandwidth from a source to each subnet. In this case we don’t measure performance from source to each and every single host in a subnet(s). We either take an input from the user to select at least five nodes in a single subnet or random five nodes are selected in each subnet for evaluation. Latency and packet loss is calculated using ping tool. We can’t calculate Available bandwidth using existing bandwidth measurement tools [2,3,4] without having a specific service listening on each selected nodes from the subnet. In some scenarios, a random node needs to be selected and consequently, a bandwidth measurement service should be installed on each host in an administrative domain. Consequently, this leads to huge network intrusion and difficulty in management. To overcome this, the upper bound of available bandwidth along the path is calculated using the equation given in [5] which is based on three parameters i.e. latency, packet loss and maximum segment size. All discovered routers are passed to End-Host monitoring with an attachment of XML configuration file included the parameters to be monitored for each router. End-Host monitoring then monitor each router autonomously and store the results with timestamp value in a relational database. iii. Network Simulation: For debugging and analysis, we simulate the whole network topology using open source Java based graphing tool TouchGraph[14]. The historic performance information is provided and plotted in graphical format using JFreeChart[15]. D. WAN Monitoring: The most important aspect of Grid Network Monitoring is to monitor the physical topology of WAN networks between two or multiple Grid sites. Different tools were analyzed like PathChar [10], BFind [11], traceroute [12] and pathneck [1]. All these mentioned tools are single sided active probing tools. They don’t require any specific service to listen at destination. We analyze these different tools in terms of network intrusiveness, and failure modes. We found pathneck as least network intrusive i.e. in default mode it sends only 33.6 KB of data for a single probe. The failure modes of pathneck are mentioned in feature and limitation of our implementation. In current scenario, this module requires a list of destinations to be inserted manually. Each destination is pathnecked after a user specified time interval. The retrieved results are parsed for the parameters shown below. 1. Total no of hops along the path. 2. Hop canonical name and IP address. 3. Latency from source to each hop. 4. Latency difference between consecutive hops.( hopRTT ) 5. Bottleneck and chokepoints along the path. 6. Upper bound of Available bandwidth of Bottlenecks and lower bound of Available bandwidth of normal links. In order to simulate the WAN topology, TouchGraph is used for physical topology visualization. Visualization steps are given below: 1. 2. 3. 4. Hosts = getSelectedDestination() List = getAll(hopRTT) Standard deviation (S.D) = getStandardDeviation(List) foreach destination in hosts a) links = getRecursiveAllHoplinks(source, destination, time_to, time_from) b) foreach link in links I. if ( link.hopRTT > S.D ) Set Link color as RED II. else blue III. if (link has bottleneck ) plot bottleneck as RED IV. plot (Link) E. Peer Information Service It is quite difficult for the user to get single information using multiple interfaces. To make the tool easy to be accessed and queried, a single interface is provided to the whole information gathered and monitored by MoMon. A single interface can be accessed using XMLRPC service. F. Triangulation and Discovery Service Grid middleware may be interested to know the network performance between two remote sites. This service enables a Peer let say A, to query the peer Information service of remote Peer B to query a remote host let say C to measure its bandwidth. This is achieved using XMLRPC queries. JXTA [17] discovery service is used to discover the peers only. G. Monitoring Tools In current implementation, MoMon used two single sided active probing tools. It uses ping for latency and loss rates, and pathneck to evaluate hop-by-hop performance along the path. An interface is provided through which new tools and techniques can be plugged in a system. Pathneck is an open source tool, used to locate bottlenecks along the path without requiring coordination from destination. It measure available bandwidth at each hop along the path. It marks the bottleneck as the link which has the lowest available bandwidth. Its algorithm is based on recursive packet train, having two types of packets i.e. Load and Measurement packets. Load packets are used to emulate the behavior of regular data traffic. Measurement packets trigger router responses to obtain the time, at which load packets are processed. IV. EVALUATION We evaluate the performance of MoMon on both Internet WAN links and Private autonomous systems. We have analyzed the WAN links between momon.niit.edu.pk (momon-server) as a source and various destinations. Autonomous system physical topology is also discovered, evaluated and simulated using a momon-server for niit.edu.pk domain. The momon-server has an Intel 2.4 GHz of processing power, 1 GB RAM, 2 GB of virtual memory and 80 GB of Hard disk. It has Red Hat Linux 9 as an operating system. A. LAN Topology Testing In this section we evaluate the performance of LAN Monitoring service in the niit.edu.pk administrative domain (niit-domain). The niit-domain has about 500 hosts and four routers. We evaluate this module with respect to speed in terms of time taken for discovering the topology, overhead using network intrusion, memory usage, and CPU usage during discovery, and through analysis. . In order to make the discovery fast, we do each ping in a separate thread. Finally, we describe the accuracy of discovery, and analysis of source to each subnets performance. The analysis measurement regarding memory and CPU are taken from Red Hat System monitor and are shown in Table 1 Factors Time Overhead Network Intrusive Memory Usage CPU Usage Analysis 9.8 seconds. 124 pings ~40 Mbytes 40% during discovery, rest 2% of single probing Completeness Hosts 440 Router 4 Subnets 4 88% Accuracy Table1: Performance Evaluation of Autonomous System Monitoring The source to each subnet performance is evaluated using iperf[4]. It is considered as an efficient available bandwidth measurement tool. We installed iperf on four different machines selected from each subnet. First, we analyze the performance between source and each subnet using MoMon which use to calculate bandwidth using [5]. The analysis table is shown in Table 2. Subnet 10.10.0.1 202.83.166.166 192.168.100.1 192.168.7.1 Momon (aw-band) Iperf (aw-band) Accuracy 83.3 Mbps 91.4 Mbps 91% 86.6 Mpbs 93.5 Mbps 92% 38.9 Mpbs 66.6 Mpbs 58% 0.90 Mpbs 1.4 Mpbs 64% Table 2: momon-server to each subnet performance Source to 192.168.7.1 link is very slow because they have to pass the bottleneck of wireless links which is of 1.5 Mpbs. The total accuracy of MoMon autonomous system topology discovery is 88% and analysis is 76%. The simulation snapshot of niit.edu.pk domain is shown in figure 4. Fig. 4. Autonomous system topology Snapshot B. WAN Topology Monitoring In this section we evaluate the performance of WAN topology monitoring which includes discovery, endto-end analysis and hop-to-hop analysis. Thirty different destinations for 24 hours with 10 minutes of interval from each other are probed. Time taken by discovery of WAN links is dependent upon the pathneck [1] tool which takes almost 5 seconds for single probing and sends 33.6 KB of data for measurement of hop-to-hop analysis of a single path. The snapshot of momon.niit.edu.pk (momon-server) as the source and iepm-bw.slac.stanford.edu, babar.fzk.de, iepm.desy.de and iepm-bw.cern.ch as our destination is shown in figure 5. Fig. 5. WAN Topological analysis C. Pathneck Failure modes We evaluate pathneck on different types of networks from slow (5 Mpbs) to fast (10 Gbps). The failure modes of a pathneck are given below which are also failure modes of MoMon WAN monitoring module. iv. It fails on fast networks like 10 Gbps. It show null measurement for the first probe but if we do pathneck just after a sec to the previous one, we get some measurement. v. It fails on paths that have less then 4 hops. vi. It fails on servers that are behind the firewall. vii. It fails on highly utilization links like NTC connection at niit.edu.pk. This link has a capacity of 4 Mbps and is used by almost 480 machines at a time. V. FUTURE WORK Grid Computing is getting popular day by day and are adapted by many organization and countries. Current grid networks are either based on centralize and decentralize architecture. The network architecture of Globus, EGEE and JClarens are efficient for local grid site’s management i.e. with in the same country. Their architecture is not efficient in case of Global scale grid. There is a need for an efficient Grid overlay network in which Grids Network autonomously based on nearest neighborhood. We want to extend our work in defining, implementing and testing the Grid overlay network in future. The idea is to self-organize the whole Grid into multiple small decentralize grids. Each small grid will have the nearest neighbor of each other with respect to other Grid sites. From nearest neighbor we mean, the nodes that are in less hops, and having highest available bandwidth as compared to other. REFERENCES [1] Ningning Hu, Li Erran Li, Zhuoqing, Morley Mao, Peter Steenkiste, and Jia Wang, Locating Internet, Bottlenecks: Algorithm, Measurement, and Implications. In Proc. Of SIGCOMM 2004, Portland, Oregon, USA [2] Manish Jain and Constantinos Dovrolis, Pathload: A measurement tool for end-to-end available bandwidth, PAM, March 2002 [3] C. Dovroliz, P. Ramanathan, and D. Moore, “What do packet dispersion techniques measure?” in Proceedings of IEEE Infocom’01, 2001 [4] A. Tirumala, M. Gates, F. Qin, J. Dugan and J. Ferguson, “Iperf – The TCP/UDP bandwidth measurement tool”. [5] Mathis, M., Semke, J., and Mahdavi, The macroscopic behaviour of the TCP congestion avoidance algorithm, SIGCOMM Comput. Commun. Rev. 27, 3 (Jul. 1997) [6] W. Mathews and L. Cottrell, “The PingER project: Active internet performance monitoring for the HENP community”, IEEE Comm, Mag, vol. 38, no. 5, pp. 130-136, May 2000. [7] Antonios Danalis and Constantinos Dovrolis, ANEMOS: An Autonomous Network Monitoring System, PAM, 2003 [8] I. C. Legrand, H. B. Newman, R.Voicu, C. Cirstoiu, C. Grigoras, M. Toarta, and C. Dobre. MonaLISA: An Agent Based, Dynamic Service System to Monitor, Control and Optimize Grid based Applications, CHEP, 2004 [9] Shawn McKee, Harvey Newman, Frank Van Lingen, Laird Kramer, Dimitri Bourilkov, and Richard Cavanough, Ultralight: A Managed Network Infrastructure for HEP, CHEP, 2006 [10] V. Jacobson, pathchar, a tool to infer characteristics of internet path, 1997, presented as April 97 MSRI talk. [11] A/ Akella, S. Seshan, and A. Shaikh. An empirical evaluation of wide-area internet bottlenecks, In Proc. ACM IMC, Octuber 2003. [12] S. Andreozzi, N. D. Bortoli, S. Fantinel, A. Ghiselli, G.L. Rubini, G. Tortone and M. C. vistoli, GridICE: A Monitoring service for the Grid, In Future Generation Computer System Journal, Elsevie, 2005 [13] Sergio Andreozzi, Augusto Ciuffloletti, Antonio Ghiselli, and Cristina Vistoli, Monitoring the Connectivity of a Grid, MGC . Toronto, Ontario, Canada. October 2004. [14] Touch Graph home page: http://www.touchgraph.com [15] JFreeChart home page: http://www.jfree.com/jfreechart/ [16] Hanemann, A., Boote, J. W., Boyd, E. L., Durand, J., Kudarimoti, L., Lapacz, R., Swany, D. M., Zurawski, J., Trocha, S., "PerfSONAR: A Service Oriented Architecture for Multi–Domain Network Monitoring", In "Proceedings of the Third International Conference on Service Oriented Computing", Springer Verlag, LNCS 3826, pp. 241–254, ACM Sigsoft and Sigweb, Amsterdam, The Netherlands, December, 2005. VI. APENDIX Schema Definition for Monitoring Parameters XML Configuration: <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"xmlns="http://www.w3schools.com"> <xs:element name="Host"> <xs:complexType> <xs:sequence> <xs:element name="Entity"> <xs:complexType> <xs:attribute name="name" type="xs:string" use="required" /> <xs:attribute name="updateAfter" type="xs:integer" use="required" /> <xs:attribute name="description" type="xs:string" use="optional" /> <xs:attribute name="dependency" type="xs:string" use="optional" /> <xs:attribute name="class_url" type="xs:string" use="optional" /> <xs:sequence> <xs:element name="Property" > <xs:complexType> <xs:attribute name="name" type="xs:string" use="required" /> <xs:attribute name="type" type="xs:string" use="required" /> <xs:attribute name="description" type="xs:string" use="optional" /> <xs:attribute name="oid" type="xs:string" use="optional" /> <xs:attribute name="unit" type="xs:string" use="optional" /> </xs:complexType></xs:element> </xs:sequence></xs:complexType></xs:element> </xs:sequence></xs:complexType><xs:element> </xs:schema>

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Introduction