Download Internet End-to-end Performance Monitoring (IEPM) and the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

Transcript
Review of National Collaboratory Middleware and Network Research Projects. Aug 2003.
New generation of tools for E2E measuremement and
diagnostics on Ultranet
…Les Cottrell and Jiri Navratil
Stanford Linear Accelerator Center (SLAC)
1. Introduction
The scientific community is increasingly dependent on networking as the need for
international cooperation grows in this field. The High Energy and Nuclear Physics
(HENP) community is one of the strongest collaboration examples using the Internet.
HENP physicists need to transfer huge amounts of data to home institutes all over the
world from major sites where there are particle accelerators, such as at SLAC, FNAL,
ORNL (US) or CERN (Switzerland).
Today, accelerator labs no longer have sufficient computing resources to meet the
analysis needs of experiments, so typically multiple computer centers (tier 1 sites, the
accelerator lab is a tier 0 site) in multiple countries share the load. To support this,
experimental data must be copied between the accelerator site and their tier 1 sites.
Further to improve access to analyzed data, it is replicated and saved at multiple data
centers. These databases are huge and growing rapidly, exceeding a PetaByte and
including the largest known databases in the world today. For example, the SLAC based
BaBar experiment had 895 TBytes as of Friday Dec 12, 2003. The networking
requirements to copy the data are commensurately large requiring copying multiTeraByte files that today take over a day to transfer. Besides the data copy requirements,
rapidly changing teams of tens to hundreds of physicists in dozens of sites around the
world need access to the analyzed data via the network in order to further analyze and
extract the physics results. Meeting these demands requires constant upgrades and the
ability to effectively use the high-speed networks.
In the next few years, when the next generation of particle accelerators start taking data
(LHC is scheduled to turn on in 2007) the demands for data transfer will increase by an
order of magnitude or more. The Grid community is also growing rapidly so gridclustered computers with immense computation and data capability will soon become one
of the main network consumers to enable processing scientific data distributed worldwide
The Internet is not one compact, centrally managed network. It is composed of many
large network backbones operated by different Internet Service Providers (ISPs), each
serving hundreds or thousands of smaller networks or sites, each with their own network
infrastructure. The UltraNet (even if it will be a private network) will be an indirect
partner of this community. First of all because some data coming via Internet will have to
enter UltraNet and vice versa. and secondly because it will use the same technology as
other networks. The capacity of UltraNet will be very high but it also means that dynamic
range (range in which data can flow via network) can be very high so the condition on the
network can be very different according to the utilization. So knowing the current
situation in the network and knowing how stable this situation could be is very important
for everybody.
Each big ISP or service organizations creates its own monitoring system, that provides
basic information for their users. It is fundamental for the management of such systems
1
Review of National Collaboratory Middleware and Network Research Projects. Aug 2003.
and providing basic information for end users. However, each such system passively
measures aggregated information in the core or at the exchange points and doesn’t cover
all the end-user levels and questions. There will be almost always be a Local Area
Network (LAN) in both ends of the UltraNet and so the user packets will almost always
pass from one LAN via UltraNet to another LAN. The End to end (E2E) measurements
will continue to be necessary since only such measurements can give correct answer on
“users requirements”. Doesn’t matter if for individual users or grid-brokers or database
redirecting schedulers.
Access to such high-speed networks and testbeds such as UltraNet is critical for
developers of new E2E tools. Current measurements tools work well for OC-3 links and
reasonably well for networks with OC-12 links. In the moment when more modern
technologies are coming into game and they will use more aggregated traffic (OC-48,
OC-192s or STM/64) with parallel optical channel etc. the current tools will fail or they
give incorrect results or they will work very inefficiently. The new technology uses more
sophisticated data transfer than in the environment for which tool was originally
developed.
We start to feel this problem even in current situation because the first such devices of
new generation appeared in the network backbones. The capacities of most networks has
increased dramatically during the last 5 years and even this brings new challenges into
current E2E methods. At high speeds using iperf is becoming increasingly problematic.
For example on a 1 Gbits/s link, in order for iperf to reach a stable TCP mode of
operation (i.e. to have emerged from the initial slow start and be in congestion avoidance)
it take about 6 seconds, so if one needs the measurement to be in the stable congestion
avoidance state for 90% of the time, then it takes 60 seconds for a measurement which
means transferring over 6GBytes of data. Even today, Abilene traffic statistics indicate
that iperf measurements create more than 5% of all traffic in the whole network.
Development of more effective tools that estimate the available bandwidth or an
achievable throughput has become a hot subject of networking research in several
projects but all current tools are able to give results in the limit below 1 Gbits.
The new generation of tools for E2E monitoring and prediction must be developed.
SONET/SDH will continue to play a key role in the next generation of networks for
many carriers. In the core network, the carriers offer services such as telephone,
dedicated leased lines, and Internet protocol (IP) data, which are continuously
transmitted. The individual data is not transmitted on separate lines; instead it is
multiplexed into higher speeds and transmitted on SONET/SDH networks at up to 10
gigabits per second (Gbps). Synchronous transmission system. (SONET) and
synchronous digital hierarchy (SDH) transmission technologies. specification outlines the
frame format, multiplexing method, and synchronization method between the equipment,
as well as the specifying optical interface. SONET that could directly extract low-speed
signals from multiplexed high-speed traffic based on the ANSI standard. The adoption
and acceptance of SONET allowed to be able to choose equipment from different
vendors instead of using only a single vendor with a proprietary optical format. The base
rate for SONET is 51 Mbps. Synchronous transport signal (STS-n) refers to the SONET
signal in the electrical domain, and optical carrier (OC-n) refers to the SONET signal in
the optical domain. The base rate for SDH is 155 Mbps. Synchronous transport module
2
Review of National Collaboratory Middleware and Network Research Projects. Aug 2003.
(STM-n) refers to the SDH signal level in both the electrical and optical domains as
following: STS-1 OC-1 N/A 51.84 , STS-3 OC-3 STM-1 155.52, STS-12 OC-12 STM-4
622.08, STS-48 OC-48 STM-16 2,488.32, STS-192 OC-192 STM-64 9,953.28, STS768 OC-768 STM-256 39,813.12 Mbps.
Multi-services capabilities with a high number of channels supports the mapping of
different type of traffic such as POS - Packet over SONET, Ethernet over SONET, ATM,
Fibre Channel. FICON, ESCON and Digital Video are available on most devices used
this technology. Various types of SONET/SDH interfaces support optical line interfaces
from 155 Mbps to 10 Gbps. It opens new possibility for combination of these features but
on other side it complicates situation in measurement from outside. The signals are
aggregated, scrambled and several times converted from serial to parallel forms and so
finally all timing effects which played role in time-dispersion measurements methods are
lost. Also “brute-force” methods are becoming totally inefficient because to fill existing
capacity from one testing machine is practically impossible. These points are only part of
the reasons why new tools for E2E measurements should be developed and why should
be developed new methods for tracking user packets in such heavily aggregated
environment and generally to study the behavior of different typical internet applications
passing via UltraNET.
We made our first experiences with this technology during our tests performed in
collaboration with CERN and Caltech in the frame of DataTAG project and also during
setting experimental 10 Gbps links for “Bandwidth challenge” on sc2003 and we
discovered first differences compare to the existing technology which should be studied
and tested in more detail in a new experimental network.
3