Download Networking for large-scale science:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CANTIS: Center for Application-Network Total-Integration for SciDAC
Scientific Discovery through Advanced Computing - Enabling Computational Technologies
DOE Office of Science Program Announcement LAB 06-04
Oak Ridge National Laboratory
Principal Investigator (coordinator and point of contact):
Nageswara S. Rao
Oak Ridge National Laboratory
P.O. Box 2008
Oak Ridge, TN 37831
Phone: 865 574-7517; Fax: 865 574-0405
E-Mail: [email protected]
Co-Principal Investigators:
Steven M. Carter, William R. Wing, Qishi Wu, Oak Ridge National Laboratory
Dantong Yu, Brookhaven National Laboratory
Matt Crawford, Fermi National Accelerator Laboratory
Karsten Schwan, Georgia Institute of Technology
Tom McKenna, Pacific Northwest National Laboratory
Les Cottrell, Sanford Linear Accelerator Center
Biswanath Mukherjee, Dipak Ghosal, University of California, Davis
Official Signing for Oak Ridge National Laboratory:
Thomas Zacharia
Associate Laboratory Director, Computing and Computational Sciences
Oak Ridge National Laboratory
P. O. Box 2008
Oak Ridge, TN 37831-6163
Phone: 865 574-4897; Fax: 865 574-4839
E-Mail: [email protected]
Requested Funding ($K):
Institution
Year 1
Brookhaven National Laboratory
400
Fermi National Accelerator Laboratory
400
Georgia Institute of Technology
300
Pacific Northwest National Laboratory
400
Oak Ridge National Laboratory
600
Sanford Linear Accelerator Center
400
University of California, Davis
300
Total 2,800
Year 2
400
400
300
400
600
400
300
2,800
Year 3
400
400
300
400
600
400
300
2,800
Year 4
400
400
300
400
600
400
300
2,800
Year 5
400
400
300
400
600
400
300
2,800
Total
2,000
2,000
1,500
2,000
3.000
2,000
1,500
14,000
Use of human subjects in proposed project: No
Use of vertebrate animals in proposed project: No
______________________________________
Principal Investigator Signature/Date
___________________________________
Laboratory Official Signature/Date
Table of Contents
Abstract
1. Background and Significance – Rao and Wing
1.1. Networking Needs of Large-Scale SciDAC Applications
1.2. Limitations of Current Methods
1.3. Application-Middleware-Network Integration
1.4. Center Organization and Management
2. Preliminary Studies
2.1. TSI and SciDAC Applications – ORNL and others
2.2. Network Provisioning – UCD, ORNL
2.3. Host Performance Issues – FNL
2.4. Network Measurement - SLAC
2.5. Data Transport Methods – ORNL, BNL
2.6. Connections to Supercomputers - ORNL
2.7. Optimal Network Realizations of Visualization Pipelines - UCD
2.8. Microscope Control over Wide-Area Connections - PNNL
2.9. Application-Middleware Filtering – GaTech
3. Research Design and Methods
3.1. Liaison and Application Integration Activities
3.2. CANTIS Toolkit – UCD, ORNL
3.3. Component Technical Areas
3.3.1. Channel Profiling and Optimization – UCD, ORNL
3.3.2 IPv4/IPv6 Networking – FNL
3.3.3 Host and Subnet Optimizations - FNL
3.3.4 Network Measurements - SLAC
3.3.5. High Performance Data Transport – ORNL, BNL
3.3.6. Connections supercomputers - ORNL
3.3.7. Optimal Networked Visualizations - UCD
3.3.8. Computational Monitoring and Steering -ORNL
3.3.9. Remote Instrument Control -PNNL
3.3.10. Application-Middleware Optimizations – GaTech
3.4. Research Plan
3.5 Software Lifecycle
4. Consortium Arrangements
Literature Cited
Budget and Budget Explanation
Other Support of Investigators
Biographical Sketches
Description of Facilities
Appendix 1: Institutional Cover and Budget Pages
Appendix 2: Two-page Institutional Milestones and Deliverables
Appendix 3: Letters of Support
ii
Abstract: (250 words)
Large-scale SciDAC computations and experiments require unprecedented capabilities in the form of
large data throughputs and/or dynamically stable (jitter-free) connections to support large data transfers,
network-based visualizations, computational monitoring and steering, and remote instrument control.
Current network-related limitations have proven to be a serious impediment to a number of large-scale
DOE SciDAC applications. Realizing these wide-area capabilities requires the vertical integration and
optimization of the entire Application-Middleware-Networking stack so that the provisioned capacities
and capabilities are available to the applications in a transparent, optimal manner. The technologies
required to accomplish such tasks transcend the solution space addressed by traditional networking or
middleware areas.
We propose to create the Center for Applications-Network Total-Integration for SciDAC (CANTIS)
to address a broad spectrum of networking-related capabilities for these applications by leveraging
existing technologies and tools, and developing the missing ones. We propose to provide components of
end-to-end solutions and in-situ optimization modules for tasks including optimal number of transport
streams for application-to-application transfers, decomposition and mapping of a visualization pipeline,
and transparent and agile operation over hybrid circuit/packet-switched or IPv4/v6 networks. The team
liaison members proactively engage SciDAC scientists to work closely to accomplish the needed
optimization, customization and tuning of solutions. The technical focus areas of CANTIS are: (a) high
performance data transport for file and memory transfers, (b) effective support of visualization streams
over wide-area connections, (c) computational monitoring and steering over network connections, (d)
remote monitoring and control of instruments including microscopes, and (e) higher-level data filtering
for optimal application-network performance.
iii
1. Background and Significance (4 pages) (20 pages total for narrative)
Large-scale SciDAC computations and experiments require unprecedented network capabilities in the
form of large network throughput and/or dynamically stable (jitter-free) connections to support large data
transfers, network-based visualizations, computational monitoring and steering, and remote instrument
control. Indeed, current network-related limitations have proven to be a serious impediment to a number
of large-scale DOE SciDAC applications. For example, climate data produced at ORNL's leadershipclass computational facility is shipped on physical media via air/ground transport to NERSC for archival
storage due to inadequate network throughput; Terascale Supernova Initiative (TSI) computational runs
waste as much as 60% of time allocations due to large network transfer times or unmonitored runaway
computations. The needed network functionalities are significantly beyond the evolutionary trajectory of
commercial networks as well as the technology paths of typical shared IP networks. In addressing the
needs of large-scale SciDAC applications, these networks are limited both in terms of the capacity such as
connection bandwidth as well as capability such as the ability to achieve application level throughputs
that match the provisioned capacities. These challenges are particularly acute for SciDAC computations
that generate massive datasets on supercomputers whose primary emphasis is on computational speeds
rather than wide-area throughputs.
We propose to create the Center for Applications-Network Total-Integration for SciDAC (CANTIS).
It will proactively engage SciDAC PI's, their developers, and science users to integrate the latest
networking and related techniques with SciDAC applications and thus remove current network-related
performance bottlenecks. The main goal of CANTIS is to serve as a comprehensive resource to equip
SciDAC applications with high-performance network capabilities in an optimal and transparent manner.
This project addresses a broad spectrum of networking-related capabilities needed in various SciDAC
applications by leveraging existing technologies and tools, and developing the missing ones.
1.1 Networking Needs of Large-Scale SciDAC Applications
Supercomputers such as the new National Leadership Computing Facility (NLCF) and others being
constructed for large-scale scientific computing are rapidly approaching 100 teraflops speeds, and are
expected to pay a critical role in a number of SciDAC science projects. They are critical to several
SciDAC fields including high energy and nuclear physics, astrophysics, climate modeling, molecular
dynamics, nanoscale materials science, and genomics. These applications are expected to generate
petabytes of data at the computing facilities, which must be transferred, visualized, analyzed by
geographically distributed teams of scientists. The computations themselves may have to be interactively
monitored and actively steered by the scientist teams. In the area of experimental science, there are
several extremely valuable experimental facilities, such as the Spallation Neutron Source, the Advanced
Photon Source, and the Relativistic Heavy Ion Collider. At these facilities, the ability to conduct
experiments remotely and then transfer the large measurement data sets for remote distributed analysis is
critical to ensuring the productivity of both the facilities and the scientific teams utilizing them. Indeed,
high-performance network capabilities add a whole new dimension to the usefulness of these computing
and experimental facilities by eliminating the “single location, single time zone” bottlenecks that
currently plague these valuable resources.
Three DOE workshops were organized in the years 2002-2003 to define the networking
requirements (shown in Table 1) of large-scale science applications, discuss possible solutions, and
describe a path forward. The high-performance networking requirements of several SciDAC applications
are reflected in Table 1. They belong two broad categories: (a) high bandwidths, typically multiples of
10Gbps, to support bulk data transfers, and (b) stable bandwidths, typically at much lower bandwidths
such as 100s of Mbps, to support interactive, steering and control operations.
1.2. Limitations of Current Methods
It has been recognized for some time within DOE and NSF that current networks and networking
technology are inadequate for supporting large scale science applications. Currently, the Internet
technologies are severely limited in meeting these demands. First, such bulk bandwidths are available
1
only in the backbone, typically shared among a number of connections that are unaware of the demands
of others. As can be seen from Table 1, the requirements for network throughput are expected to grow by
a factor of 20-25 every two years. These requirements will overwhelm production IP networks that
typically see bandwidth improvement by only a factor of 6-7. Second, due to the shared nature of packet
switched networks, typical Internet connections often exhibit complicated dynamics, thereby lacking the
stability needed for steering and control operations [9]. These requirements are quite different from those
of a typical Internet user that needs smaller bandwidths at much higher delays and jitter levels (typically
for email, web browsing, etc.). As a result, the industry is not expected to develop the required end-to-end
network solutions of the type and scale needed for large-scale science applications. Furthermore, the
operating environments of these applications consisting of supercomputers, high-performance storage
systems and high-precision instruments present a problem space that is not traditionally addressed by the
main stream networking community. Indeed, focused efforts from multi-disciplinary teams are necessary
to completely develop the required network capabilities.
Science Areas
Today End2End
Throughput
0.5 Gb/s
5 years
End2End
100 Gb/s
5-10 Years
End2End
1000 Gb/s
0.5 Gb/s
160-200 Gb/s
N x 1000 Gb/s
Not yet started
1 Gb/s
1000 Gb/s +
QoS for control
Fusion Energy
0.066 Gb/s
(500 MB/s burst)
N x 1000 Gb/s
Astrophysics
0.013 Gb/s
(1 TBy/week)
0.198 Gb/s
(500MB/
20 sec. burst)
N*N multicast
Genomics Data &
Computation
0.091 Gb/s
(1 TBy/day)
100s of users
1000 Gb/s +
QoS for
control
High Energy
Physics
Climate (Data &
Computation)
SNS NanoScience
1000 Gb/s
Remarks
high bulk
throughput
high bulk
throughput
remote control and
time critical
throughput
time critical
throughput
computational
steering and
collaborations
high throughput
and steering
Table 1 - Current and near-term network bandwidth requirements summarized in the DOE Roadmap
workshops.
1.3. Application-Middleware-Network Integration
One of the important realizations in a number of large-scale science applications is that increases in
connection data rates or raw bandwidth alone are insufficient to address these application-to-application
performance issues. Typically, when connection bandwidths are increased, the performance bottlenecks
move to a different part of the so-called Application-Middleware-Networking (AMN) stack, typically
from network core to end hosts or subnets. Realizing these capabilities in SciDAC applications requires
the vertical integration and optimization of the entire AMN stack so that the provisioned capacities and
capabilities are available to the applications in a transparent, optimal manner. The phrase ApplicationNetwork Total-Integration collectively refers to the spectrum of AMN technologies needed for
accomplishing these tasks. It transcends the solution space addressed by traditional networking or
middleware areas.
We will briefly describe our experiences with TSI application which underscores the need for
addressing AMNTI Total Integration (AMNTI). The present efforts to address these AMNTI challenges
have been scattered in a number of science and networking projects. This resulted in application scientists
being forced become experts in networking, which is distraction from their main science mission. On the
2
other hand, isolated technology solutions only resulted in very limited improvements at the application
level. Instead, an integrated effort between AMN components is crucial to achieving these capabilities.
We will briefly describe an example to highlight the need for such integrated effort. TSI
hydrodynamics code is executed on ORNL Cray X1 and the data is transferred to North Carolina State
University (NCSU) for further analysis. The network throughput was initially limited to about 200-300
Mbps (after tuning bbcp, multiple stream TCP transport) over a 1Gbps connection to Cray X1 over
production network. The shared nature of the connection was initially thought to be main source of this
throughput limitation. Then engineering solution was to provide a dedicated 1Gbps connection over NSF
CHEETAH network from Cray X1 OS nodes to NCSU cluster, and provide a Hurricane protocol that
could achieve 99% utilization on dedicated 1Gbps connection [R06]. Incidentally at the same time Cray
X1 was upgraded to Cray X1(E), which involved replacing the processor on OS nodes with more
powerful ones. This dedicated connection and Hurricane software was handed off of TSI scientists, who
were expecting network throughputs of the order of 1Gbps. Instead throughputs were of the order of 20
Mbps. This problem was finally diagnosed by a joint team to a bottleneck to protocol stack
implementation on Cray, which was later addressed. The process involves carefully and systematically
eliminating various components of the end-to-end data and execution path as sources of bottleneck. This
effort was possible mainly because of the NSF CHEETAH and LDRD projects at ORNL that established
a close collaboration between TSI scientists and computer scientists. The goal of CANTIS is to provide
such a collaborative capability to SciDAC projects for a wider spectrum of AMN tasks.
1.4 CANTIS Concept and Organization
The center will focus on the high-performance networking capabilities needed by SciDAC applications
including the ones using DOE experimental and computing facilities. The center is based on two
concepts: first, Technology Experts to address specific technical areas, and second, Science Liaisons with
assigned science areas to work directly with SciDAC scientists. The center will leverage existing tools
and techniques while simultaneously developing nascent technologies that will enable the latest
developments in high performance networking (such as UltraScience Net layer-1&2 circuits) to enhance
SciDAC applications. The science liaisons will actively engage scientists in their assigned areas, through
regular meetings and interactions, to help the center stay abreast of SciDAC requirements, anticipate
SciDAC needs and guide the development, transfer and optimization of appropriate performanceenhancing and "gap-filling" shims.
The center will serve as a one-stop resource for any SciDAC PI with a high-performance or unique
network requirement. In addition to actively pursuing SciDAC network-related needs, it will also respond
to requests initiated by application scientists. In either case, a single science liaison will be assigned who
would interface with the appropriate technology experts to: (i) identify the technology components, (ii)
develop comprehensive network enabling solutions, and (iii) install, test and optimize the solution. Each
participant institution of this center is assigned science areas to act as a liaison based on their prior and
on-going relationships with science projects. Also each institution is assigned to lead specific technical
AMN areas.
We propose to develop tools specifically tailored to SciDAC to generate application-level connection
profiles, and identify potential components of an end-to-end AMN solution. We will develop in-situ
optimization tools that can be dropped in-place along with the application to identify the optimal AMN
configurations such as an optimal number of transport streams for application-to-application transfers,
decomposition and mapping of a visualization pipeline, and transparent and agile operation over hybrid
circuit/packet-switched or IPv4/v6 networks. We will also develop APIs and libraries customized to
SciDAC for various AMN technologies. These tools cut-across a wide spectrum of SciDAC applications,
but may not be necessarily optimal (or optimally configured) for a specific application environment.
Team member liaisons will work closely with SciDAC scientists to accomplish any needed finer
optimization, customization and tuning.
These SciDAC application liaison assignments of team members are as follows: Accelerator Science
and Simulation - SLAC, FNAL; Astrophysics - ORNL, SLAC; Climate Modeling and Simulation -
3
ORNL; Computational Biology - PNNL, GaTech; Fusion Science – GaTech, ORNL; Groundwater
Modeling and Simulation - PNNL; High-Energy Physics - FNAL, BNL; High-Energy and Nuclear
Physics - BNL, FNAL; Nuclear Physics - BNL; Combustion Science and Simulation - UCDavis, ORNL;
Quantum Chromodynamics - FNAL, BNL.
The specific technical AMN tasks and their institutional primary assignments are as follows:
BNL is the primary repository for storage and dissemination of tools and work products of the
project. It will also provide Terapaths software for automatic end-to-end, interdomain QoS.
FNAL will provide Lambda Station and its IPv4/v6 expertise for massive file transport tasks. Also, it
will address host performance issues including Linux operating system under load.
GaTech will address the "impedance matching" of network-specific middleware to different transport
technologies. It will provide expertise in middleware-based data filters to the project.
ORNL will provide overall coordination of the project, and will also provide the technologies for
dedicated-channel reservation and provisioning, and optimized transport methods.
PNNL will generalize and extend the remote instrument control software it is currently developing
for remote control of its confocal microscope facility.
SLAC will bring its expertise in network monitoring to the project, with a specific emphasis on endto-end monitoring and dynamic matching of applications to network characteristics.
UC Davis will focus on optimizing remote applications by distributing component functions, and
they will particularly concentrate on remote visualization tasks.
This center consists of subject-area experts from national laboratories and universities with extensive
research and practical expertise in networking as well as in enabling applications and middleware to make
optimal use of provisioned network capabilities. In particular, several PIs are currently active participants
in SciDAC, NSF and other enabling technology projects.
The technical focus areas of CANTIS are: (a) high performance data transport for file and memory
transfers, (b) effective support of visualization streams over wide-area connections, (c) computational
monitoring and steering over network connections with an emphasis on leadership-class applications, (d)
remote monitoring and control of instruments including microscopes, and (e) higher-level data filtering
for optimal application-network performance.
4
3. Preliminary Studies (5pages – half page each institution)
We present a brief (albeit ad hoc and partial) account of our work in various AMNTI technologies
that are at various stages of development and deployment. These technologies contribute to varying
degrees and can potentially be integrated to realize the needed network capabilities. While several of
these works have been motivated primarily by our existing collaborations with TSI, the underlying
technologies are applicable to much a wider class of large-scale SciDAC science applications [R05.2].
These methods together represent the building blocks that may be integrated to effectively carry out a
SciDAC computation on a supercomputer while being monitored, visualized and steered by a group of
geographically dispersed domain experts.
2.1. TSI and SciDAC Applications – ORNL and others
2.2. Network Provisioning – UCD, ORNL
It is generally believed that networking demands of large-scale science can be effectively addressed by
providing on-demand dedicated channels of the required bandwidths directly to end users or applications.
The UltraScience Net (USN) [4] is commissioned by the U. S. Department of Energy (DOE) to facilitate
the development of these constituent technologies specifically targeting the large-scale science
applications carried out at national laboratories and collaborating institutions. Its main objective is to
provide developmental and testing environments for a wide spectrum of the technologies that can lead to
production-level deployments within the span of next few years. Compared to other testbeds, USN has a
much larger backbone bandwidth (20-40 Gbps) and larger footprint (several thousands of miles), and a
close proximity to several DOE facilities. USN provides on-demand dedicated channels: (a) 10Gbps
channels for large data transfers, and (b) high-precision channels for fine control operations. User sites
can be connected to USN through its edge switches, and can utilize the provisioned dedicated channels
during the allocated time slots. In the initial deployment, its data-plane consists of dual 10 Gbps lambdas,
both OC192 SONET and 10GigE WAN PHY, of several thousand miles in length from Atlanta to
Chicago to Seattle to Sunnyvale.
The channels are provisioned on-demand using layer-1 and layer-2 switches in the backbone and
Multiple Service Provisioning Platforms (MSPP) at the edges in a flexible configuration using a secure
control-plane. In addition, there are dedicated hosts at the USN edges that can be used for testing
middleware, protocols, and other software modules that are not site specific. The control-plane employs a
centralized scheduler to compute the channel allocations and a signaling daemon to generate
configuration signals to switches. This control plane is implemented using a hardware based VPN that
encrypts the control signals and provides authenticated and authorized access.
Circuit-switched High-speed End-to-End Transport ArcHitecture (CHEETAH) [5] is an
NSF project to develop and demonstrate a network infrastructure for provisioning dedicated bandwidth
channels and the associated transport, middleware and application technologies to support large data
transfers and interactive visualizations needed for eScience applications, particularly TSI. Its footprint
spans North Carolina State University (NCSU) and Oak Ridge National Laboratory (ORNL) with a
possible extension to University of Virginia and City University of New York. For long haul connections,
CHEETAH provides Synchronous Optical Network (SONET) channels through MSPPs that are
interconnected by OC192 links. The channels are set up using off-the-shelf equipment, such as MSPPs
and SONET switches using Generalized Multiple Protocol Label Switching (GMPLS) for dynamic circuit
setup/release.
2.3. Host Performance Issues – FNL
2.4. Network Measurement – SLAC
2.5. Data Transport Methods – ORNL, BNL
While USN is being rolled out, we conducted preliminary experiments to understand the
properties of dedicated channels using a smaller scale connection from Oak Ridge to Atlanta. Despite the
5
limited nature of this connection, several important performance considerations have been revealed by
these experiments. We describe these results here due to their relevance in utilizing the channels that will
be provided by USN and CHEETAH. We tested several of UDP-based protocols over 1Gbps dedicated
channel implemented as a layer-3 loop from ORNL to Atlanta back to ORNL. In all these cases finer
tuning of parameters was needed to achieve throughputs of the order 900Mbps. This tuning required
intricate knowledge of their implementation details, which became a very time consuming and somewhat
unstructured process. Furthermore, it was unclear as to how to increase the throughput beyond 900Mbps
which required fine tuning of host-related parameters. In response, we implemented the Hurricane
protocol with tunable parameters for rate optimization (in approximately the same time needed to tune
these protocols). Hurricane’s overall structure is quite similar to other UDP-based protocols, in particular
UDT. However, certain unique ad hoc optimizations are incorporated into the protocol. Hurricane is
mainly designed to support high-speed data transfers over channels with dedicated capacities, and hence
does not incorporate any congestion control. We first collected measurements to build the throughput and
loss profiles of the channel. The low loss rate at which high throughput is achieved motivated NACKbased scheme since a large number of ACKS would otherwise consume significant cpu time thereby
effecting the sending and receiving processes. The initial source sending rate in Hurricane is derived from
the throughput profile of the channel, and is further tuned manually to achieve about 950Mbps.
Additional throughput increase is achieved by grouping the NACKs and manually obtaining optimal
group size, which resulted in 991Mbps throughput.
2.6. Connections to Supercomputers – ORNL
Supercomputers present networking challenges that are not addressed by the networking community
mainly due the complexity of data and execution paths inside those machines and interconnections. The
ORNL-NCSU channel from Cray X1 supercomputer at ORNL to a node at NCSU is provisioned by
policy to allow 400 Mbps ows from a single application during certain periods. This connection has a
peak rate of 1Gbps due to Ethernet and FiberChannel segments. But, the data path is complicated. Data
from a Cray node traverses System Port Channel (SPC) channel and then transits to FiberChannel (FC)
connection to CNS (Cray Network Subsystem). Then CNS converts FC frames to Ethernet LAN
segments and sends them onto GigE NIC. These Ethernet frames are then mapped at ORNL router onto
SONET long-haul connection to NCSU; then they transit to Ethernet LAN and arrive at the cluster node
via GigE NIC. Thus the data path consists of a sequence of different segments: SPC, FC, Ethernet LAN,
SONET long-haul, and Ethernet LAN. The default TCP over this connection achieved throughputs of the
order 50 Mbps. Then bbpc protocol adapted for Cray X1 achieved throughputs in the range 200-300Mbps
depending on the traffic condition. Hurricane protocol tuned for this connection consistently achieved
throughputs of the order 400Mps [6]. Since the capacity of Cray X1's NIC is limited to 1Gbps, we
developed an interconnection configuration to yield network throughputs higher than 1 Gbps [8]. In place
of its native CNS we utilized USN-CNS (UCNS) which is a dual Opteron host containing two pairs of
PCI-X slots; it is equipped with two FC (Emulex 9802DC) cards each with two 2Gbps FC ports, and a
Chelsio 10GigE NIC. A similar host with a 10GigE NIC is used as a data sink. Then channel bonding was
disabled on UCNS, and parallel streams were sent through the individual channels. In this configuration,
memory-to-memory transfers reached 3Gbps for writes and 4.8Gbps for reads between Cray X1 and
UCNS. In summary, these are among the highest throughputs reported from Cray X1 to external hosts
over local and wide-area connections.
2.7. Optimal Network Realizations of Visualization Pipelines – UCD
Remote visualization is considered a critical enabling technology for a number of large-scale scientific
computations that involve visualizing large datasets on storage systems using remote high-end
visualization clusters. Such systems of different types and scales have been a topic of focused research for
many years by the visualization community. In general, a remote visualization system forms a pipeline
consisting of a server at one end holding the data set, and a client at the other end providing rendering and
display. In between, zero or more hosts perform a variety of intermediate processing and/or caching and
6
prefetching operations. A wide area network typically connects all the participating nodes. The goal is to
achieve interactive visualization on the client-end without transferring the whole dataset to it. For large
datasets such transfers may not be practical for a variety of reasons: (i) client may not have the adequate
processing capability and/or storage, and (ii) connection bandwidth may not be adequate to ensure
reasonable transfer times. In general, such remote visualization tasks require expertise in different areas
including high performance computing, networking, storage and visualization. We developed an approach
to dynamically decompose and map a visualization pipeline onto wide-area network nodes for achieving
fast interactions between users and applications in a distributed remote visualization environment [7].
This scheme is implemented using modules that implement various visualization and networking subtasks
to enable the selection and aggregation of nodes with disparate capabilities as well as connections with
varying bandwidths.
We estimated the transport and processing times of various subtasks, and present a polynomial time
algorithm to compute a decomposition and mapping that achieves minimum end-to-end delay. Our
experimental results based on an implementation deployed at several geographically distributed nodes
illustrated the efficiency of system in terms of utilizing various nodes.
2.8. Microscope Control over Wide-Area Connections – PNNL
2.9. Application-Middleware Filtering – GaTech
7
3. Research Design and Methods (10 pages – 1 page each institution)
3.1. Liaison and Application Integration Activities
3.2. CANTIS Toolkit – UCD, ORNL
3.3. Component Technical Areas
3.3.1. Channel Profiling and Optimization – UCD, ORNL
3.3.2 IPv4/IPv6 Networking – FNL
3.3.3 Host and Subnet Optimizations – FNL
3.3.4 Network Measurements – SLAC
3.3.5. High Performance Data Transport – ORNL, BNL
3.3.6. Connections supercomputers – ORNL
3.3.7. Optimal Networked Visualizations – UCD
3.3.8. Computational Monitoring and Steering –ORNL
3.3.9. Remote Instrument Control –PNNL
3.3.10. Application-Middleware Optimizations – GaTech
3.4. Research Plan
The center members will participate in weekly telecoms and bi-annual face to face meetings. A website
will facilitate the dissemination of software and tools as well as will provide an alternate mechanism for a
scientist to initiate communication with the center on specific AMN tasks.
3.5 Software Lifecycle
8
4. Consortium Arrangements (one page)
9
Literature Cited
[C05] S. M. Carter. Networking the national leadership computing facility. In Proceedings of Cray Users Group
Meeting. 2005.
[N01] NSF Grand Challenges in eScience Workshop, 2001. Final Report:
http://www.evl.uic.edu/activity/NSF/final.html.
[T] Terascale supernova initiative. http://www.phy.ornl.gov/tsi.
[R06] N. S. V. Rao, W. R. Wing, S. M. Carter, Q. Wu, High-speed dedicated channels and experimental results with
hurricane protocol, Annals of Telecommunications, 2006, in press.
[R05.1] N. S. V. Rao, W. R. Wing, S. M. Carter, Q. Wu, UltraScience Net: Network testbed for large-scale science
applications, IEEE Communications Magazine, November 2005, vol. 3, no. 4, pages, S12-17. expanded version
available at www.csm.ornl.gov/ultranet.
[R05.2] N. S. Rao, S. M. Carter, Q.Wu, W. R. Wing, M. Zhu, A. Mezzacappa, M. Veeraraghavan, J. M. Blondin,
Networking for large-scale science: Infrastructure, provisioning, transport and application mapping, Proceedings of
SCiDAC Meeting , 2005.
[Z05] X. Zheng, M. Veeraraghavan, N. S. V. Rao, Q. Wu, and M. Zhu. CHEETAH: Circuit-switched high-speed
end-to-end transport architecture testbed. IEEE Communications Magazine, 2005.
[9] N. S. V. Rao, J. Gao, and L. O. Chua. On dynamics of transport protocols in wide-area internet connections. In
L. Kocarev and G. Vattay, editors, Complex Dynamics in Communication Networks. 2005.
[w02] High-performance networks for high-impact science, 2002. Report of the High-Performance Network
Planning Workshop, August 13-15, 2002, http://DOECollaboratory.pnl.gov/meetings/hpnpw.
[w03.1] DOE Workshop on Ultra High-Speed Transport Protocols and Network Provisioning for Large-Scale
Science Applications, April 2003, Argone National Laboratory.
[W03.2] DOE Science Networking: Roadmap to 2008 Workshop, June 3-5, 2003, Jefferson Laboratory
[W05] Q. Wu and N.S.V. Rao. Protocols for high-speed data transport over dedicated channels. In Third
International Workshop on Protocols for Fast Long-Distance Networks. 2005.
[Z04] M. Zhu, Q. Wu, N. S. V. Rao, and S. S. Iyengar. Adaptive visualization pipeline decomposition and mapping
onto computer networks. In Third International Conference on Image and Graphics. 2004.
10
Budget and Budget Explanation
Budget Summary:
($K)
Institution
Year 1
Brookhaven National Laboratory
400
Fermi National Accelerator Laboratory
400
Georgia Institute of Technology
300
Pacific Northwest National Laboratory
400
Oak Ridge National Laboratory
600
Sanford Linear Accelerator Center
400
University of California, Davis
300
Total 2,800
Year 2
400
400
300
400
600
400
300
2,800
11
Year 3
400
400
300
400
600
400
300
2,800
Year 4
400
400
300
400
600
400
300
2,800
Year 5
400
400
300
400
600
400
300
2,800
Total
2,000
2,000
1,500
2,000
3.000
2,000
1,500
14,000
Other Support of Investigators
Nageswara S. Rao
Support: Current
Project/Proposal Title: Enabling supernova computations by integrated transport and provisioning
Source of Support: Department of Energy
Total Award Amount: $550,000 Total Award Period Covered: 01/08/04 - 09/30/06
Location of Project: ORNL
Person-Months Per Year Committed to the Project. Cal: 2.0
Support: Current
Project/Proposal Title: Collaborative Research: End-to-end Provisioned Network
Tested for Large-Scale eScience Applications
Source of Support: National Science Foundation
Total Award Amount: $ 850,000 Total Award Period Covered: 01/01/04 - 09/30/06
Location of Project: ORNL
Person-Months Per Year Committed to the Project. Cal:2.0
Support: Current Project/Proposal Title: ORNL Ultranet Testbed
Source of Support: Department of Energy
Total Award Amount: $ 4,500,000 Total Award Period Covered: 07/01/03 - 09/30/06
Location of Project: ORNL
Person-Months Per Year Committed to the Project. Cal: 2.0
Support: Current
Project/Proposal Title: Advanced network capabilities for terascale computations on leadershipclass computers
Source of Support: ORNL LDRD
Total Award Amount: $360,000 Total Award Period Covered: 01/10/04 - 09/30/06
Location of Project: Laboratory Directed R&D Fund
Person-Months Per Year Committed to the Project. Cal: 2.0
12
Biographical Sketches
Steven M. Carter, Oak Ridge National Laboratory
Les Cottrell, Sanford Linear Accelerator Center
Matt Crawford, Fermi National Accelerator Laboratory
Dipak Ghosal, University of California, Davis
Tom McKenna, Pacific Northwest National Laboratory
Biswanath Mukherjee, University of California, Davis
Nageswara S. Rao, Oak Ridge National Laboratory
Karsten Schwan, Georgia Institute of Technology
William R. Wing, Oak Ridge National Laboratory
Qishi Wu, Oak Ridge National Laboratory
Dantong Yu, Brookhaven National Laboratory
13
Nageswara S. Rao
Computer Science and Mathematics Division
Oak Ridge National Laboratory
Oak Ridge, TN 37831-6016
[email protected]
EDUCATION
Ph.D. Computer Science, Louisiana State University, 1988
M.S. Computer Science, Indian Institute of Science, Bangalore, India, 1984
B.S. Electronics Engineering, Regional Engineering College, Warangal, India, 1982
PROFESSIONAL EXPERIENCE
1. Distinguished Research Staff (2001-present), Senior Research Staff Member (1997-2001),
Research Staff Member (1993-1997), Intelligent and Emerging Computational Systems Section,
Computer Science and Mathematics Division, Oak Ridge National Laboratory.
2. Assistant Professor, Department of Computer Science, Old Dominion University, Norfolk, VA
23529- 0162, 1988 – 1993; Adjunct Associate Professor, 1993 - present.
3. Research and Teaching Assistant, Department of Computer Science, Louisiana State University,
Baton Rouge, LA, 1985 - 1988.
4. Research Assistant, School of Automation, Indian Institute of Science, India, 1984 - 1985.
HONORS AND AWARDS
Best paper award, Innovations and Commercial Applications of Distributed Sensor Networks
Symposium, 2005, Washington.
Special Commendation for Significant Contributions to Network Modeling and Simulation Program,
Defense Advanced Research Projects Agency.
UT-Battelle Technical Accomplishment Award, 2000.
Research Initiation Award of National Science Foundation, 1991-1993.
CURRENT RESEARCH INTERESTS
He published more than 200 research papers in a number of technical areas including the following.
1. Measurement-based methods end-to-end performance assurance over wide-area networks: He
rigorously showed that measurement-based methods can be used to provide probabilistic end-toend delay guarantees in wide-area networks. This work is funded by NSF, DARPA and DOE.
2. Wireless networks: He is a co-inventor of the connectivity-through-time protocol that enables
communication in adhoc wireless dynamic networks with no infrastructure.
3. Sensor Fusion: Developed novel fusion methods for combining information from multiple
sources using measurements without any knowledge of error distributions. He proposed fusers
that are guaranteed to perform at least as good as best subset of sources. This work is funded by
DOE, NSF and ONR.
SELECTED RECENT PUBLICATIONS
N. S. V. Rao, W. R. Wing, S. M. Carter, Q. Wu, High-speed dedicated channels and experimental results
with hurricane protocol, Annals of Telecommunications, 2006, in press.
N. S. V. Rao, W. R. Wing, S. M. Carter, Q. Wu, UltraScience Net: Network testbed for large-scale
science applications, IEEE Communications Magazine, November 2005, vol. 3, no. 4, pages, S12-17.
14
N. S. V. Rao, J. Gao, L. O. Chua, On dynamics of transport protocols in wide-area Internet connections,
in Complex Dynamics in Communication Networks, L. Kocarev and G. Vattay (editors), 2005.
X. Zheng, M. Veeraraghavan, N. S. V. Rao, Q. Wu, and M. Zhu. CHEETAH: Circuit-switched highspeed end-to-end transport architecture testbed, IEEE Communications Magazine, 2005.
J. Gao, N. S. V. Rao, J. Hu, J. Ai, Quasi-periodic route to chaos in the dynamics of Internet transport
protocols, Physical Review Letters, 2005.
N. S. V. Rao, D. B. Reister, J. Barhen, Information fusion methods based on physical laws, IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 1, 2005, pp. 66-77.
J. Gao, N. S. V. Rao, TCP AIMD dynamics over Internet connections, IEEE Communications Letters,
vo. 9, no. 1, 2005, pp. 4-6.
N. S. V. Rao, W. C. Grimmell, Y.C. Bang, S. Radhakrishnan, On algorithms for quickest paths under
di_erent routing modes, IEICE Transactions on Communications, vol. E87-B, no. 4, 2004, pp. 10021006.
N. S. V. Rao, Q. Wu, S. S. Iyengar, On throughput stabilization of network transport, IEEE
Communications Letters, vol. 8, no. 1, 2004, pp. 66-68.
N. S. V. Rao, Probabilistic quickest path algorithm, Theoretical Computer Science, vol. 312, no. 2-3, pp.
189-201, 2004.
N. S. V. Rao, Overlay Networks of In-Situ Instruments for Probabilistic Guarantees on Message Delays
in Wide-Area Networks, IEEE Journal on Selected Areas of Communications, vol. 22, no. 1, 2004.
N. S. V. Rao, Y. C. Bang, S. Radhakrishnan, Q.Wu, S. S. Iyengar, and H. Cho, NetLets: Measurementbased routing daemons for low end-to-end delays over networks, Computer Communications, vol. 26,no.
8, 2003, pp. 834-844.
W. C. Grimmell and N. S. V. Rao, On source-based route computation for quickest paths under dynamic
bandwidth constraints, International Journal of Foundations of Computer Science, vol. 14, no. 3, pp. 503523, 2003.
S. Radhakrishnan, G. Racherla, N. Sekharan, N. S. V. Rao, S. G.Batsell, Protocol for dynamic ad-hoc
networks using distributed spanning trees, Wireless Networks, vol. 9, pp. 673-686, 2003.
J. B. Gao, W. W.Tung, N. S. V. Rao, Noise induced Hopf bifurcation-like sequence and transition to
chaos in Lorenz equations, Physical Review Letters, Dec 2002.
N. S. V. Rao, NetLets for end-to-end delay minimization in distributed computing over Internet using
two-Paths, International Journal of High Performance Computing Applications, 2002, vol. 16, no. 3,
2002.
COLLABORATIONS AN OTHER AFFILIATIONS
Senior Member, Institute of Electrical and Electronics Engineering (IEEE)
Member, Association of Computing Machinery (ACM)
Postgraduate Advisor: S. Ding (Louisiana State University); M. Zhu (Louisiana State University);
Q. Wu (Louisiana State University);
Postdoctoral Advisor: Q. Wu (Computer Science and Mathematics Division)
15
Description of Facilities
A. ORNL NLCF
B. UltraSceinceNet and CHEETAH
C. PNNL Microscopes
D. Lambda Station
16
Appendix 1: Institutional Cover (DOE Order 5700.7C) and Budget (DOE F 4620.1) Pages
1. Brookhaven National Laboratory
2. Fermi National Accelerator Laboratory
3. Georgia Institute of Technology
4. Oak Ridge National Laboratory
5. Pacific Northwest National Laboratory
6. Sanford Linear Accelerator Center
7. University of California, Davis
17
Appendix 2: Institutional Tasks and Milestones and Deliverables (2 pages each)
1. Brookhaven National Laboratory
2. Fermi National Accelerator Laboratory
3. Georgia Institute of Technology
4. Oak Ridge National Laboratory
5. Pacific Northwest National Laboratory
6. Sanford Linear Accelerator Center
7. University of California, Davis
18
4. Oak Ridge National Laboratory: Tasks and Milestones (2 pages)
ORNL tasks are grouped in three separate categories:
1. Overall Project Coordination:
2. Liaison to Astrophysics, Climate and Combustion Areas:
3. Technical Areas: The following technical tasks will be carried out by ORNL: (a) high
performance data transport protocols for dedicated channels, (b) effective support of visualization
streams over wide-area connections, (c) computational monitoring and steering over network
connections with an emphasis on leadership-class applications,
Milestones:
19
Appendix 3: Letters of Support
Petascale/Terascale Supernova Initiative – Anthony Mezzacappa, Oak Ridge National Laboratory
Combustion – Jackie Chan, Sandia National Laboratory
20
Organizational Approach:
This center consists of experts from national laboratories and universities with extensive
research and practical expertise in networking areas as well as in enabling the applications and
middleware to optimally utilize the provisioned networking capabilities. In particular, several PIs
are currently active participants in network enabling tasks for a number of SciDAC and NSF
science application projects.
The center will focus on networking capabilities needed exclusively by SciDAC science
applications with an emphasis on high-performance environments of supercomputers and
experimental facilities. The center is based on the concepts of Technology Experts who will
address specific technical areas, and Science Liaisons who will interface and coordinate with
SciDAC scientists. It will leverage existing network enabling tools and also develop the missing
technologies to continue to provide state-of-the-art cross-cutting tools applicable across SciDAC
applications. In addition, the science liaisons actively engage scientists in their assigned areas,
through regular meetings and interactions, to help the center to keep pace with the SciDAC
requirements and to seek the current network enabling solutions. Technology experts
continuously monitor and leverage the available networking technologies, and help the center
identify and develop the “gap” technologies. They will test and integrate all these technologies
into cross-cutting profiling and optimizing tools that can be utilized across science projects, and
also maintain them at a repository.
The center will serve as a one-stop resource for any SciDAC scientist with a network enabling
task. Each such request will be assigned a single science liaison who would interface with the
appropriate experts to work closely to: (i) identify the technology components, (ii) develop a
comprehensive network enabling solution, and (iii) install, test and optimize the solution in place.
Each participant institution of this center is assigned 1-2 science areas to act as a liaison based
on their prior and on-going relationships with science projects. Also each institution is assigned
specific technical network enabling areas.
The center members participate in weekly telecoms and bi-annual meetings. A website will
facilitate the dissemination of software and tools as well as will provide a mechanism for a
scientist to initiate a communication with the center on a specific network enabling task.
21
Summary.
SciDAC large-scale computations and experiments require unprecedented network capabilities
in the form of large bandwidth and dynamically stable connections to support large data
transfers, network-based visualizations, computational monitoring and steering, and remote
instrument control. The applications must be network enabled which requires: (i) provisioning of
network connectivity of appropriate capacity and identifying and tuning of the protocols, and (ii)
customizing and optimizing the applications and middleware so that the provisioned capacities
are made available to the applications in a transparent manner without performance
degradations. The phrase Network Enabling (NE) collectively refers to the technologies needed
for accomplishing both tasks (i) and (ii), which is significantly beyond the solution space
addressed by traditional networking or middleware areas.
Current limitations on network capabilities have proven to be a serious drawback in a number of
SciDAC applications in accessing datasets, visualization systems, supercomputers and
experimental facilities. In particular, current network enabling technologies are inadequate in
meeting the high-performance capabilities needed for the transfer of tera-peta byte datasets,
stable and responsive steering of super-computations, and controlling experiments on remote
user facilities. These network capabilities represent a multi-faceted challenge with the needed
component technologies ranging from provisioning to transport to network-application
mappings, which must be developed, tested, integrated and/or optimized within the SciDAC
science environments. Currently, such efforts have been scattered and/or duplicated across a
number of SciDAC projects, often placing an undue burden on the application scientists to
become experts in network enabling technologies.
The main goal of this center is to serve a comprehensive resource in providing the highperformance network capabilities to SciDAC applications in an optimal and transparent manner.
This project addresses a broad spectrum of networking-related capabilities all needed in various
SciDAC applications by leveraging existing technologies and tools, and developing the missing
ones. We propose to develop NE profiling tools specifically tailored to SciDAC that will generate
the connection profiles at application-level, and identify potential components of an end-to-end
NE solution. We will develop in-situ optimization tools that can be dropped in place along with
the application to identify the optimal NE configurations such as optimal number of TCP streams
for application-to-application transfers, and decomposition and mapping of modules of a
visualization pipeline. We will also develop APIs and libraries customized to science
applications to provide them the NE technologies in a transparent manner. These tools are
cross-cutting in nature in that they could be used across a wide spectrum of applications.
However, they may not be necessarily optimal or optimally configured within a specific science
application environment. To accomplish such finer optimization, appropriate team members will
work closely with SciDAC scientists to design, customize and optimize NE technologies within
the specific application environments.
In terms of technical areas, this center will addres NE technologies for: (a) high performance
data transport both for file and memory transfers, (b) effective support of visualization streams
over wide-area connections, (c) computational monitoring and steering over network
connections with a particular emphasis on supercomputers, (d) remote monitoring and control of
instruments including microscopes, and (e) higher-level data filtering to optimize network
performance at the application level.
22