Download Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

IEEE 802.1aq wikipedia , lookup

Net bias wikipedia , lookup

Distributed firewall wikipedia , lookup

Computer network wikipedia , lookup

Deep packet inspection wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Airborne Networking wikipedia , lookup

Network tap wikipedia , lookup

Transcript
MoMon: Single Ended, Plug-and-Play, and Autonomous Grid Network
Monitoring Tool
Abstract:
Grid Network Monitoring is a critical factor in the optimization of Grid Networks. Several existing network
monitoring tools may not scale well to the needs of high-performance Grid network monitoring, while
others are dependent upon SNMP access to routers/switches on a global scale. These limitations can be
surmounted with the recent advances in active probing tools. In this paper, we present an architecture,
implementation and evaluation of MoMon: Mobile Monitoring tool. It targets Grid Network Monitoring
which includes End-Host monitoring, end-to-end performance measurement, and Grid Network Topology
analysis and its visualization. We use existing active probing, single sided, open source tools for
monitoring information gathering, i.e. ping for latency and packet loss, pathload for end-to-end available
bandwidth, and pathneck for discovering bottlenecks, choke points, and hops along the path and hop-to-hop
available bandwidth.
I. INTRODUCTION
In the last decade, Grid computing has made huge strides, and virtually revolutionized high performance
and high throughput computing for use in scientific applications. In Grids, sites share their resources in
order to solve problems collaboratively. Shared resources can be geographically dispersed and in different
administrative domains with states changing dynamically. The basic task of Grid computing is to provide
shared resources to an entity on demand, transparently of its location.
Grid Monitoring is a separate domain in Grid computing. It deals with monitoring the state of shared
resources and the entities providing them. One critical factor in Grid Monitoring is Grid Network
Monitoring. The drawbacks of Grid infrastructure deprived of Grid Network monitoring will be illustrated
by an example. Consider Figure 1, we have four collaborating sites. Say Peer A, wants to replicate a huge
amount of data across the Grid. In this scenario, there are three different options available to A i.e. B, C,
and D. Peer A knows the CPU and storage requirement of all the destinations but the critical factor in
transferring the huge amount of data is the state of the network between A and its destination. In order to
transfer the data, Peer A should know the latency, packet loss, TCP Throughput, Bottlenecks and Choke
points [1] along the destination path. Based on these parameters, Peer A should transfer its data while
utilizing the network efficiently.
Fig. 1. Grid Infrastructure
Grid Site’s and its Local Area Network (LAN) is also a critical resource for grid. It also needs to be
monitored periodically. Grid Middleware is interested in CPU, Memory, Storage and IO states of a grid
sites. While Grid Site itself is interested in cognizing the current and predicting future performance of it’s
LAN. LAN performance information is required to be cognized in order to locate Grid Site at an
appropriate physical location.
In this paper, we target Grid Network Monitoring which includes End-Host monitoring, end-to-end
performance measurement between grid sites, and Grid Network Topology analysis and its visualization.
We use some existing active probing, single sided, open source tools for monitoring information gathering
i.e. ping for latency and packet loss, pathneck [1] for discovering bottlenecks, choke points, and hops along
the path and hop-to-hop available bandwidth. The monitoring information that is gathered is stored in
relational database in order to provide persistency. Such information can be used for network debugging
and engineering. A standard interface is provided for Grid middleware and applications that will use such
information in job scheduling and data replication.
Rest of the paper is organized as follows. Section 2 mention some of the most related work, section 3
describes the MoMon system architecture, section 4 evaluate the performance and accuracy of MoMon,
and section 5 describes some ideas for future extension of the system.
II. RELATED WORK
There are different techniques and implementation available for Grid network Monitoring. The most
popular systems that are related to MoMon are mentioned in this section.
MonaLISA [8] is a JINI based Dynamic Distributed Service Architecture. It is able to provide monitoring,
control and global optimization services for complex systems like Grids. In order to monitor the WAN
links like Ultralight [9], it requires SNMP access on each and every single router along the path. It uses
tools like pathload [2] for end-to-end available bandwidth and SNMP access on each router to get hop-byhop topological information of the network. Considering the fact, Router can only be accessed through
SNMP, within an administrative domain. It will fail if MonaLISA wants to monitor the WAN connectivity
that is not managed or having no SNMP access permission to routers. GridICE [12] provides a distributed
monitoring tool designed for Grid system. Its network monitoring is based on [13], which provide the
monitoring of only end-to-end performance like latency, loss rates and available bandwidth. It doesn’t
provide the hop-to-hop WAN topological information neither it provides visualization and simulation of the
Grid Site’s Network connectivity.
PingER [6] uses ICMP echo message to measure latency and loss rates over 850 hosts around the world.
It provides network end-to-end performance between hosts and regions. It also provides long-term trends
about many geographical areas of the internet. ANEMOS [7] is an end-to-end active measurement
framework for autonomous network monitoring. It is use to find network anomalies, based on user defined
rules and patterns. PerfSONAR [16] is an infrastructure of network performance monitoring. It provides the
capacity and utilization of each host along the path. Like MonaLISA, it requires an SNMP access on each
router along the path. Currently perfSONAR is deployed at ESnet, Abiline, and GEANT router interfaces.
It will fail on unmanaged and on autonomous system in which they don’t have an SNMP access.
The similarity between above systems and MoMon is that they provide end-to-end performance and
some provides topological information regarding the Grid Site’s connectivity with the limitation of SNMP
access on router. To overcome the limitation of managed WAN links, pathneck [1] is able to provide the
hop-to-hop analysis. One other advantage of MoMon is that it provides the whole analysis using the single
sided active probing tools. Consequently, the whole system doesn’t require any thing to be installed or
configured on remote sites.
III. ARCHITECTURE
Grid Network monitoring is a critical aspect of Grid Computing. It doesn’t matter how efficient networks
are, they can’t survive without periodic and real time monitoring. The goal of our project is to help and
assist grid middleware and applications, and network engineers to debug network for errors. Architecture of
MoMon is shown in figure 2.
Fig. 2. MoMon: Mobile Monitoring Architecture
A. End-Host Monitoring:
This module enables user to monitor the end-host by specifying the monitoring parameters of their own
interest like CPU, I/O, Memory and storage. The term end-host means, any device that is connected to
Network i.e. PCs, router, and switch. Here the issue arises, how to make the monitoring of host fully
customizable. We made our tool customizable using SNMP and XML. An XML configuration file is
required, containing information such as name, description, type, Object ID, and unit of monitoring
parameters. After specifying the configuration file, the tool starts monitoring according the parameters
defined. Each monitoring parameter value(s) that is fetched from host MIB after specific interval specified
by user. To provide persistency, the XML hierarchy schema is dynamically converted into relational
schema and the values are saved in a relational database.
Fig. 3. End-Host Monitoring process
In order to provide fault tolerance and scalability in terms of added parameters to be monitored, we use
multi threaded architecture, in which each parameter is monitored in separate thread. There is a central
thread called as Thread Master, which controls the execution of all threads. This module provides a single
interface to all other modules. It requires an IP address of a host and an XML configuration file containing
the parameters to be monitored. It then starts an autonomous thread in order to monitor the new host
received.
B. End-to-End Performance Measurement:
In End-to-End Performance monitoring we use existing active probing tools like ping for latency and
packet loss, pathload[2] for available bandwidth, and pathneck[1] for bottleneck discovery along the path.
In our implementation, single measurement task includes multiple measurements after specific interval time
in order to make the measurement less network intrusive. At start up, this module receives input
destinations addresses, and does independent measurement for each destination with an interval of 10
minutes. In order to provide persistency each measurement task for the specific destination is stored in
relational database for network forecasting and debugging. One of the primary features of this module is a
Plug-and-Play support. User only requires some minor changes in order to incorporate new tools or replace
tools in a measurement task.
C. LAN Monitoring:
Grid site’s LAN (Local Area Network) is one of a Grid Computing resource. It needs to be monitored
periodically in order to evaluate its performance at different intervals of time. LAN monitoring can be
simplified, if we know the physical layout i.e. topology of a network. While, it is very difficult to manually
discover and evaluate the performance of LAN. The aim of this module is an autonomous discovery of
LAN physical layout, evaluating the performance between multiple subnets with in LAN, and simulation of
whole private network in graphical format.
i. LAN Physical Topology Discovery: Assuming the end machine running this module is managed and
all routers in a LAN are also managed. The topology discovery algorithm is given below:
1.
2.
3.
4.
5.
6.
7.
8.
topology_discovery ( router )
if ( (routers = SNMP.getNextHops(router) == NULL) then return
else
foreach nextHop in routers
linkList.add ( router, nextHop )
nodeHashTable.add ( nextHop, SNMP.getArpCache(nextHop))
topology_discovery (nextHop)
routerList.add(routers)
The above recursive algorithm is started by passing the address of local machine on which the
module is running. The first next hop will be the default gateway of that machine. The algorithm will
go on till it cross the boundaries of a LAN administrative domain. When the nextHop reach the last
hop, the getNextHop () function (Step 2) will return null. This is because SNMP access is only
allowed to the users with in an administrative domain. If an Autonomous System (AS) is connected
to another AS which have a poor security constraints, like users with in other administrative domain
can query SNMP their routers. In that scenario, users are required to provide the maximum no. of
hops to discover. In order to avoid local host to be managed, user can provide the default gateway
manually to over come this limitation.
ii. Evaluation of LAN: In order to evaluate the performance of network, we evaluate latency, packet
loss and available bandwidth from a source to each subnet. In this case we don’t measure
performance from source to each and every single host in a subnet(s). We either take an input from
the user to select at least five nodes in a single subnet or random five nodes are selected in each
subnet for evaluation. Latency and packet loss is calculated using ping tool. We can’t calculate
Available bandwidth using existing bandwidth measurement tools [2,3,4] without having a specific
service listening on each selected nodes from the subnet. In some scenarios, a random node needs to
be selected and consequently, a bandwidth measurement service should be installed on each host in
an administrative domain. Consequently, this leads to huge network intrusion and difficulty in
management. To overcome this, the upper bound of available bandwidth along the path is calculated
using the equation given in [5] which is based on three parameters i.e. latency, packet loss and
maximum segment size.
All discovered routers are passed to End-Host monitoring with an attachment of XML
configuration file included the parameters to be monitored for each router. End-Host monitoring then
monitor each router autonomously and store the results with timestamp value in a relational database.
iii. Network Simulation: For debugging and analysis, we simulate the whole network topology using
open source Java based graphing tool TouchGraph[14]. The historic performance information is
provided and plotted in graphical format using JFreeChart[15].
D. WAN Monitoring:
The most important aspect of Grid Network Monitoring is to monitor the physical topology of WAN
networks between two or multiple Grid sites. Different tools were analyzed like PathChar [10], BFind [11],
traceroute [12] and pathneck [1]. All these mentioned tools are single sided active probing tools. They
don’t require any specific service to listen at destination. We analyze these different tools in terms of
network intrusiveness, and failure modes. We found pathneck as least network intrusive i.e. in default mode
it sends only 33.6 KB of data for a single probe. The failure modes of pathneck are mentioned in feature
and limitation of our implementation.
In current scenario, this module requires a list of destinations to be inserted manually. Each destination is
pathnecked after a user specified time interval. The retrieved results are parsed for the parameters shown
below.
1. Total no of hops along the path.
2. Hop canonical name and IP address.
3. Latency from source to each hop.
4. Latency difference between consecutive hops.( hopRTT )
5. Bottleneck and chokepoints along the path.
6. Upper bound of Available bandwidth of Bottlenecks and lower bound of Available bandwidth of
normal links.
In order to simulate the WAN topology, TouchGraph is used for physical topology visualization.
Visualization steps are given below:
1.
2.
3.
4.
Hosts = getSelectedDestination()
List = getAll(hopRTT)
Standard deviation (S.D) = getStandardDeviation(List)
foreach destination in hosts
a) links = getRecursiveAllHoplinks(source, destination, time_to, time_from)
b) foreach link in links
I. if ( link.hopRTT > S.D ) Set Link color as RED
II. else blue
III. if (link has bottleneck ) plot bottleneck as RED
IV. plot (Link)
E. Peer Information Service
It is quite difficult for the user to get single information using multiple interfaces. To make the tool easy
to be accessed and queried, a single interface is provided to the whole information gathered and monitored
by MoMon. A single interface can be accessed using XMLRPC service.
F. Triangulation and Discovery Service
Grid middleware may be interested to know the network performance between two remote sites. This
service enables a Peer let say A, to query the peer Information service of remote Peer B to query a remote
host let say C to measure its bandwidth. This is achieved using XMLRPC queries. JXTA [17] discovery
service is used to discover the peers only.
G. Monitoring Tools
In current implementation, MoMon used two single sided active probing tools. It uses ping for latency
and loss rates, and pathneck to evaluate hop-by-hop performance along the path. An interface is provided
through which new tools and techniques can be plugged in a system.
Pathneck is an open source tool, used to locate bottlenecks along the path without requiring coordination
from destination. It measure available bandwidth at each hop along the path. It marks the bottleneck as the
link which has the lowest available bandwidth. Its algorithm is based on recursive packet train, having two
types of packets i.e. Load and Measurement packets. Load packets are used to emulate the behavior of
regular data traffic. Measurement packets trigger router responses to obtain the time, at which load packets
are processed.
IV. EVALUATION
We evaluate the performance of MoMon on both Internet WAN links and Private autonomous systems.
We have analyzed the WAN links between momon.niit.edu.pk (momon-server) as a source and various
destinations. Autonomous system physical topology is also discovered, evaluated and simulated using a
momon-server for niit.edu.pk domain. The momon-server has an Intel 2.4 GHz of processing power, 1 GB
RAM, 2 GB of virtual memory and 80 GB of Hard disk. It has Red Hat Linux 9 as an operating system.
A. LAN Topology Testing
In this section we evaluate the performance of LAN Monitoring service in the niit.edu.pk administrative
domain (niit-domain). The niit-domain has about 500 hosts and four routers. We evaluate this module with
respect to speed in terms of time taken for discovering the topology, overhead using network intrusion,
memory usage, and CPU usage during discovery, and through analysis. . In order to make the discovery
fast, we do each ping in a separate thread. Finally, we describe the accuracy of discovery, and analysis of
source to each subnets performance. The analysis measurement regarding memory and CPU are taken from
Red Hat System monitor and are shown in Table 1
Factors
Time
Overhead
Network Intrusive
Memory Usage
CPU Usage
Analysis
9.8 seconds.
124 pings
~40 Mbytes
40% during discovery, rest 2%
of single probing
Completeness
Hosts 440
Router 4
Subnets 4
88%
Accuracy
Table1: Performance Evaluation of Autonomous System Monitoring
The source to each subnet performance is evaluated using iperf[4]. It is considered as an efficient
available bandwidth measurement tool. We installed iperf on four different machines selected from each
subnet. First, we analyze the performance between source and each subnet using MoMon which use to
calculate bandwidth using [5]. The analysis table is shown in Table 2.
Subnet
10.10.0.1
202.83.166.166
192.168.100.1
192.168.7.1
Momon (aw-band)
Iperf (aw-band)
Accuracy
83.3 Mbps
91.4 Mbps
91%
86.6 Mpbs
93.5 Mbps
92%
38.9 Mpbs
66.6 Mpbs
58%
0.90 Mpbs
1.4 Mpbs
64%
Table 2: momon-server to each subnet performance
Source to 192.168.7.1 link is very slow because they have to pass the bottleneck of wireless links which
is of 1.5 Mpbs. The total accuracy of MoMon autonomous system topology discovery is 88% and analysis
is 76%. The simulation snapshot of niit.edu.pk domain is shown in figure 4.
Fig. 4. Autonomous system topology Snapshot
B. WAN Topology Monitoring
In this section we evaluate the performance of WAN topology monitoring which includes discovery, endto-end analysis and hop-to-hop analysis. Thirty different destinations for 24 hours with 10 minutes of
interval from each other are probed. Time taken by discovery of WAN links is dependent upon the
pathneck [1] tool which takes almost 5 seconds for single probing and sends 33.6 KB of data for
measurement of hop-to-hop analysis of a single path. The snapshot of momon.niit.edu.pk (momon-server)
as the source and iepm-bw.slac.stanford.edu, babar.fzk.de, iepm.desy.de and iepm-bw.cern.ch as our
destination is shown in figure 5.
Fig. 5. WAN Topological analysis
C. Pathneck Failure modes
We evaluate pathneck on different types of networks from slow (5 Mpbs) to fast (10 Gbps). The failure
modes of a pathneck are given below which are also failure modes of MoMon WAN monitoring module.
iv. It fails on fast networks like 10 Gbps. It show null measurement for the first probe but if we do
pathneck just after a sec to the previous one, we get some measurement.
v. It fails on paths that have less then 4 hops.
vi. It fails on servers that are behind the firewall.
vii. It fails on highly utilization links like NTC connection at niit.edu.pk. This link has a capacity of 4
Mbps and is used by almost 480 machines at a time.
V. FUTURE WORK
Grid Computing is getting popular day by day and are adapted by many organization and countries. Current
grid networks are either based on centralize and decentralize architecture. The network architecture of
Globus, EGEE and JClarens are efficient for local grid site’s management i.e. with in the same country.
Their architecture is not efficient in case of Global scale grid. There is a need for an efficient Grid overlay
network in which Grids Network autonomously based on nearest neighborhood.
We want to extend our work in defining, implementing and testing the Grid overlay network in future. The
idea is to self-organize the whole Grid into multiple small decentralize grids. Each small grid will have the
nearest neighbor of each other with respect to other Grid sites. From nearest neighbor we mean, the nodes
that are in less hops, and having highest available bandwidth as compared to other.
REFERENCES
[1] Ningning Hu, Li Erran Li, Zhuoqing, Morley Mao, Peter Steenkiste, and Jia Wang, Locating Internet,
Bottlenecks: Algorithm, Measurement, and Implications. In Proc. Of SIGCOMM 2004, Portland, Oregon,
USA
[2] Manish Jain and Constantinos Dovrolis, Pathload: A measurement tool for end-to-end available
bandwidth, PAM, March 2002
[3] C. Dovroliz, P. Ramanathan, and D. Moore, “What do packet dispersion techniques measure?” in
Proceedings of IEEE Infocom’01, 2001
[4] A. Tirumala, M. Gates, F. Qin, J. Dugan and J. Ferguson, “Iperf – The TCP/UDP bandwidth
measurement tool”.
[5] Mathis, M., Semke, J., and Mahdavi, The macroscopic behaviour of the TCP congestion avoidance
algorithm, SIGCOMM Comput. Commun. Rev. 27, 3 (Jul. 1997)
[6] W. Mathews and L. Cottrell, “The PingER project: Active internet performance monitoring for the
HENP community”, IEEE Comm, Mag, vol. 38, no. 5, pp. 130-136, May 2000.
[7] Antonios Danalis and Constantinos Dovrolis, ANEMOS: An Autonomous Network Monitoring System,
PAM, 2003
[8] I. C. Legrand, H. B. Newman, R.Voicu, C. Cirstoiu, C. Grigoras, M. Toarta, and C. Dobre. MonaLISA:
An Agent Based, Dynamic Service System to Monitor, Control and Optimize Grid based Applications,
CHEP, 2004
[9] Shawn McKee, Harvey Newman, Frank Van Lingen, Laird Kramer, Dimitri Bourilkov, and Richard
Cavanough, Ultralight: A Managed Network Infrastructure for HEP, CHEP, 2006
[10] V. Jacobson, pathchar, a tool to infer characteristics of internet path, 1997, presented as April 97
MSRI talk.
[11] A/ Akella, S. Seshan, and A. Shaikh. An empirical evaluation of wide-area internet bottlenecks, In
Proc. ACM IMC, Octuber 2003.
[12] S. Andreozzi, N. D. Bortoli, S. Fantinel, A. Ghiselli, G.L. Rubini, G. Tortone and M. C. vistoli,
GridICE: A Monitoring service for the Grid, In Future Generation Computer System Journal, Elsevie,
2005
[13] Sergio Andreozzi, Augusto Ciuffloletti, Antonio Ghiselli, and Cristina Vistoli, Monitoring the
Connectivity of a Grid, MGC . Toronto, Ontario, Canada. October 2004.
[14] Touch Graph home page: http://www.touchgraph.com
[15] JFreeChart home page: http://www.jfree.com/jfreechart/
[16] Hanemann, A., Boote, J. W., Boyd, E. L., Durand, J., Kudarimoti, L., Lapacz, R., Swany, D. M.,
Zurawski, J., Trocha, S., "PerfSONAR: A Service Oriented Architecture for Multi–Domain Network
Monitoring", In "Proceedings of the Third International Conference on Service Oriented Computing",
Springer Verlag, LNCS 3826, pp. 241–254, ACM Sigsoft and Sigweb, Amsterdam, The Netherlands,
December, 2005.
VI. APENDIX
Schema Definition for Monitoring Parameters XML Configuration:
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"xmlns="http://www.w3schools.com">
<xs:element name="Host">
<xs:complexType>
<xs:sequence>
<xs:element name="Entity">
<xs:complexType>
<xs:attribute name="name" type="xs:string" use="required" />
<xs:attribute name="updateAfter" type="xs:integer" use="required" />
<xs:attribute name="description" type="xs:string" use="optional" />
<xs:attribute name="dependency" type="xs:string" use="optional" />
<xs:attribute name="class_url" type="xs:string" use="optional" />
<xs:sequence>
<xs:element name="Property" >
<xs:complexType>
<xs:attribute name="name" type="xs:string" use="required" />
<xs:attribute name="type" type="xs:string" use="required" />
<xs:attribute name="description" type="xs:string" use="optional" />
<xs:attribute name="oid" type="xs:string" use="optional" />
<xs:attribute name="unit" type="xs:string" use="optional" />
</xs:complexType></xs:element>
</xs:sequence></xs:complexType></xs:element>
</xs:sequence></xs:complexType><xs:element>
</xs:schema>