Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CANTIS: Center for Application-Network Total-Integration for SciDAC Scientific Discovery through Advanced Computing - Enabling Computational Technologies DOE Office of Science Program Announcement LAB 06-04 Oak Ridge National Laboratory Principal Investigator (coordinator and point of contact): Nageswara S. Rao Oak Ridge National Laboratory P.O. Box 2008 Oak Ridge, TN 37831 Phone: 865 574-7517; Fax: 865 574-0405 E-Mail: [email protected] Co-Principal Investigators: Steven M. Carter, William R. Wing, Qishi Wu, Oak Ridge National Laboratory Dantong Yu, Brookhaven National Laboratory Matt Crawford, Fermi National Accelerator Laboratory Karsten Schwan, Georgia Institute of Technology Tom McKenna, Pacific Northwest National Laboratory Les Cottrell, Sanford Linear Accelerator Center Biswanath Mukherjee, Dipak Ghosal, University of California, Davis Official Signing for Oak Ridge National Laboratory: Thomas Zacharia Associate Laboratory Director, Computing and Computational Sciences Oak Ridge National Laboratory P. O. Box 2008 Oak Ridge, TN 37831-6163 Phone: 865 574-4897; Fax: 865 574-4839 E-Mail: [email protected] Requested Funding ($K): Institution Year 1 Brookhaven National Laboratory 400 Fermi National Accelerator Laboratory 400 Georgia Institute of Technology 300 Pacific Northwest National Laboratory 400 Oak Ridge National Laboratory 600 Sanford Linear Accelerator Center 400 University of California, Davis 300 Total 2,800 Year 2 400 400 300 400 600 400 300 2,800 Year 3 400 400 300 400 600 400 300 2,800 Year 4 400 400 300 400 600 400 300 2,800 Year 5 400 400 300 400 600 400 300 2,800 Total 2,000 2,000 1,500 2,000 3.000 2,000 1,500 14,000 Use of human subjects in proposed project: No Use of vertebrate animals in proposed project: No ______________________________________ Principal Investigator Signature/Date ___________________________________ Laboratory Official Signature/Date Table of Contents Abstract 1. Background and Significance – Rao and Wing 1.1. Networking Needs of Large-Scale SciDAC Applications 1.2. Limitations of Current Methods 1.3. Application-Middleware-Network Integration 1.4. Center Organization and Management 2. Preliminary Studies 2.1. TSI and SciDAC Applications – ORNL and others 2.2. Network Provisioning – UCD, ORNL 2.3. Host Performance Issues – FNL 2.4. Network Measurement - SLAC 2.5. Data Transport Methods – ORNL, BNL 2.6. Connections to Supercomputers - ORNL 2.7. Optimal Network Realizations of Visualization Pipelines - UCD 2.8. Microscope Control over Wide-Area Connections - PNNL 2.9. Application-Middleware Filtering – GaTech 3. Research Design and Methods 3.1. Liaison and Application Integration Activities 3.2. CANTIS Toolkit – UCD, ORNL 3.3. Component Technical Areas 3.3.1. Channel Profiling and Optimization – UCD, ORNL 3.3.2 IPv4/IPv6 Networking – FNL 3.3.3 Host and Subnet Optimizations - FNL 3.3.4 Network Measurements - SLAC 3.3.5. High Performance Data Transport – ORNL, BNL 3.3.6. Connections supercomputers - ORNL 3.3.7. Optimal Networked Visualizations - UCD 3.3.8. Computational Monitoring and Steering -ORNL 3.3.9. Remote Instrument Control -PNNL 3.3.10. Application-Middleware Optimizations – GaTech 3.4. Research Plan 3.5 Software Lifecycle 4. Consortium Arrangements Literature Cited Budget and Budget Explanation Other Support of Investigators Biographical Sketches Description of Facilities Appendix 1: Institutional Cover and Budget Pages Appendix 2: Two-page Institutional Milestones and Deliverables Appendix 3: Letters of Support ii Abstract: (250 words) Large-scale SciDAC computations and experiments require unprecedented capabilities in the form of large data throughputs and/or dynamically stable (jitter-free) connections to support large data transfers, network-based visualizations, computational monitoring and steering, and remote instrument control. Current network-related limitations have proven to be a serious impediment to a number of large-scale DOE SciDAC applications. Realizing these wide-area capabilities requires the vertical integration and optimization of the entire Application-Middleware-Networking stack so that the provisioned capacities and capabilities are available to the applications in a transparent, optimal manner. The technologies required to accomplish such tasks transcend the solution space addressed by traditional networking or middleware areas. We propose to create the Center for Applications-Network Total-Integration for SciDAC (CANTIS) to address a broad spectrum of networking-related capabilities for these applications by leveraging existing technologies and tools, and developing the missing ones. We propose to provide components of end-to-end solutions and in-situ optimization modules for tasks including optimal number of transport streams for application-to-application transfers, decomposition and mapping of a visualization pipeline, and transparent and agile operation over hybrid circuit/packet-switched or IPv4/v6 networks. The team liaison members proactively engage SciDAC scientists to work closely to accomplish the needed optimization, customization and tuning of solutions. The technical focus areas of CANTIS are: (a) high performance data transport for file and memory transfers, (b) effective support of visualization streams over wide-area connections, (c) computational monitoring and steering over network connections, (d) remote monitoring and control of instruments including microscopes, and (e) higher-level data filtering for optimal application-network performance. iii 1. Background and Significance (4 pages) (20 pages total for narrative) Large-scale SciDAC computations and experiments require unprecedented network capabilities in the form of large network throughput and/or dynamically stable (jitter-free) connections to support large data transfers, network-based visualizations, computational monitoring and steering, and remote instrument control. Indeed, current network-related limitations have proven to be a serious impediment to a number of large-scale DOE SciDAC applications. For example, climate data produced at ORNL's leadershipclass computational facility is shipped on physical media via air/ground transport to NERSC for archival storage due to inadequate network throughput; Terascale Supernova Initiative (TSI) computational runs waste as much as 60% of time allocations due to large network transfer times or unmonitored runaway computations. The needed network functionalities are significantly beyond the evolutionary trajectory of commercial networks as well as the technology paths of typical shared IP networks. In addressing the needs of large-scale SciDAC applications, these networks are limited both in terms of the capacity such as connection bandwidth as well as capability such as the ability to achieve application level throughputs that match the provisioned capacities. These challenges are particularly acute for SciDAC computations that generate massive datasets on supercomputers whose primary emphasis is on computational speeds rather than wide-area throughputs. We propose to create the Center for Applications-Network Total-Integration for SciDAC (CANTIS). It will proactively engage SciDAC PI's, their developers, and science users to integrate the latest networking and related techniques with SciDAC applications and thus remove current network-related performance bottlenecks. The main goal of CANTIS is to serve as a comprehensive resource to equip SciDAC applications with high-performance network capabilities in an optimal and transparent manner. This project addresses a broad spectrum of networking-related capabilities needed in various SciDAC applications by leveraging existing technologies and tools, and developing the missing ones. 1.1 Networking Needs of Large-Scale SciDAC Applications Supercomputers such as the new National Leadership Computing Facility (NLCF) and others being constructed for large-scale scientific computing are rapidly approaching 100 teraflops speeds, and are expected to pay a critical role in a number of SciDAC science projects. They are critical to several SciDAC fields including high energy and nuclear physics, astrophysics, climate modeling, molecular dynamics, nanoscale materials science, and genomics. These applications are expected to generate petabytes of data at the computing facilities, which must be transferred, visualized, analyzed by geographically distributed teams of scientists. The computations themselves may have to be interactively monitored and actively steered by the scientist teams. In the area of experimental science, there are several extremely valuable experimental facilities, such as the Spallation Neutron Source, the Advanced Photon Source, and the Relativistic Heavy Ion Collider. At these facilities, the ability to conduct experiments remotely and then transfer the large measurement data sets for remote distributed analysis is critical to ensuring the productivity of both the facilities and the scientific teams utilizing them. Indeed, high-performance network capabilities add a whole new dimension to the usefulness of these computing and experimental facilities by eliminating the “single location, single time zone” bottlenecks that currently plague these valuable resources. Three DOE workshops were organized in the years 2002-2003 to define the networking requirements (shown in Table 1) of large-scale science applications, discuss possible solutions, and describe a path forward. The high-performance networking requirements of several SciDAC applications are reflected in Table 1. They belong two broad categories: (a) high bandwidths, typically multiples of 10Gbps, to support bulk data transfers, and (b) stable bandwidths, typically at much lower bandwidths such as 100s of Mbps, to support interactive, steering and control operations. 1.2. Limitations of Current Methods It has been recognized for some time within DOE and NSF that current networks and networking technology are inadequate for supporting large scale science applications. Currently, the Internet technologies are severely limited in meeting these demands. First, such bulk bandwidths are available 1 only in the backbone, typically shared among a number of connections that are unaware of the demands of others. As can be seen from Table 1, the requirements for network throughput are expected to grow by a factor of 20-25 every two years. These requirements will overwhelm production IP networks that typically see bandwidth improvement by only a factor of 6-7. Second, due to the shared nature of packet switched networks, typical Internet connections often exhibit complicated dynamics, thereby lacking the stability needed for steering and control operations [9]. These requirements are quite different from those of a typical Internet user that needs smaller bandwidths at much higher delays and jitter levels (typically for email, web browsing, etc.). As a result, the industry is not expected to develop the required end-to-end network solutions of the type and scale needed for large-scale science applications. Furthermore, the operating environments of these applications consisting of supercomputers, high-performance storage systems and high-precision instruments present a problem space that is not traditionally addressed by the main stream networking community. Indeed, focused efforts from multi-disciplinary teams are necessary to completely develop the required network capabilities. Science Areas Today End2End Throughput 0.5 Gb/s 5 years End2End 100 Gb/s 5-10 Years End2End 1000 Gb/s 0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s Not yet started 1 Gb/s 1000 Gb/s + QoS for control Fusion Energy 0.066 Gb/s (500 MB/s burst) N x 1000 Gb/s Astrophysics 0.013 Gb/s (1 TBy/week) 0.198 Gb/s (500MB/ 20 sec. burst) N*N multicast Genomics Data & Computation 0.091 Gb/s (1 TBy/day) 100s of users 1000 Gb/s + QoS for control High Energy Physics Climate (Data & Computation) SNS NanoScience 1000 Gb/s Remarks high bulk throughput high bulk throughput remote control and time critical throughput time critical throughput computational steering and collaborations high throughput and steering Table 1 - Current and near-term network bandwidth requirements summarized in the DOE Roadmap workshops. 1.3. Application-Middleware-Network Integration One of the important realizations in a number of large-scale science applications is that increases in connection data rates or raw bandwidth alone are insufficient to address these application-to-application performance issues. Typically, when connection bandwidths are increased, the performance bottlenecks move to a different part of the so-called Application-Middleware-Networking (AMN) stack, typically from network core to end hosts or subnets. Realizing these capabilities in SciDAC applications requires the vertical integration and optimization of the entire AMN stack so that the provisioned capacities and capabilities are available to the applications in a transparent, optimal manner. The phrase ApplicationNetwork Total-Integration collectively refers to the spectrum of AMN technologies needed for accomplishing these tasks. It transcends the solution space addressed by traditional networking or middleware areas. We will briefly describe our experiences with TSI application which underscores the need for addressing AMNTI Total Integration (AMNTI). The present efforts to address these AMNTI challenges have been scattered in a number of science and networking projects. This resulted in application scientists being forced become experts in networking, which is distraction from their main science mission. On the 2 other hand, isolated technology solutions only resulted in very limited improvements at the application level. Instead, an integrated effort between AMN components is crucial to achieving these capabilities. We will briefly describe an example to highlight the need for such integrated effort. TSI hydrodynamics code is executed on ORNL Cray X1 and the data is transferred to North Carolina State University (NCSU) for further analysis. The network throughput was initially limited to about 200-300 Mbps (after tuning bbcp, multiple stream TCP transport) over a 1Gbps connection to Cray X1 over production network. The shared nature of the connection was initially thought to be main source of this throughput limitation. Then engineering solution was to provide a dedicated 1Gbps connection over NSF CHEETAH network from Cray X1 OS nodes to NCSU cluster, and provide a Hurricane protocol that could achieve 99% utilization on dedicated 1Gbps connection [R06]. Incidentally at the same time Cray X1 was upgraded to Cray X1(E), which involved replacing the processor on OS nodes with more powerful ones. This dedicated connection and Hurricane software was handed off of TSI scientists, who were expecting network throughputs of the order of 1Gbps. Instead throughputs were of the order of 20 Mbps. This problem was finally diagnosed by a joint team to a bottleneck to protocol stack implementation on Cray, which was later addressed. The process involves carefully and systematically eliminating various components of the end-to-end data and execution path as sources of bottleneck. This effort was possible mainly because of the NSF CHEETAH and LDRD projects at ORNL that established a close collaboration between TSI scientists and computer scientists. The goal of CANTIS is to provide such a collaborative capability to SciDAC projects for a wider spectrum of AMN tasks. 1.4 CANTIS Concept and Organization The center will focus on the high-performance networking capabilities needed by SciDAC applications including the ones using DOE experimental and computing facilities. The center is based on two concepts: first, Technology Experts to address specific technical areas, and second, Science Liaisons with assigned science areas to work directly with SciDAC scientists. The center will leverage existing tools and techniques while simultaneously developing nascent technologies that will enable the latest developments in high performance networking (such as UltraScience Net layer-1&2 circuits) to enhance SciDAC applications. The science liaisons will actively engage scientists in their assigned areas, through regular meetings and interactions, to help the center stay abreast of SciDAC requirements, anticipate SciDAC needs and guide the development, transfer and optimization of appropriate performanceenhancing and "gap-filling" shims. The center will serve as a one-stop resource for any SciDAC PI with a high-performance or unique network requirement. In addition to actively pursuing SciDAC network-related needs, it will also respond to requests initiated by application scientists. In either case, a single science liaison will be assigned who would interface with the appropriate technology experts to: (i) identify the technology components, (ii) develop comprehensive network enabling solutions, and (iii) install, test and optimize the solution. Each participant institution of this center is assigned science areas to act as a liaison based on their prior and on-going relationships with science projects. Also each institution is assigned to lead specific technical AMN areas. We propose to develop tools specifically tailored to SciDAC to generate application-level connection profiles, and identify potential components of an end-to-end AMN solution. We will develop in-situ optimization tools that can be dropped in-place along with the application to identify the optimal AMN configurations such as an optimal number of transport streams for application-to-application transfers, decomposition and mapping of a visualization pipeline, and transparent and agile operation over hybrid circuit/packet-switched or IPv4/v6 networks. We will also develop APIs and libraries customized to SciDAC for various AMN technologies. These tools cut-across a wide spectrum of SciDAC applications, but may not be necessarily optimal (or optimally configured) for a specific application environment. Team member liaisons will work closely with SciDAC scientists to accomplish any needed finer optimization, customization and tuning. These SciDAC application liaison assignments of team members are as follows: Accelerator Science and Simulation - SLAC, FNAL; Astrophysics - ORNL, SLAC; Climate Modeling and Simulation - 3 ORNL; Computational Biology - PNNL, GaTech; Fusion Science – GaTech, ORNL; Groundwater Modeling and Simulation - PNNL; High-Energy Physics - FNAL, BNL; High-Energy and Nuclear Physics - BNL, FNAL; Nuclear Physics - BNL; Combustion Science and Simulation - UCDavis, ORNL; Quantum Chromodynamics - FNAL, BNL. The specific technical AMN tasks and their institutional primary assignments are as follows: BNL is the primary repository for storage and dissemination of tools and work products of the project. It will also provide Terapaths software for automatic end-to-end, interdomain QoS. FNAL will provide Lambda Station and its IPv4/v6 expertise for massive file transport tasks. Also, it will address host performance issues including Linux operating system under load. GaTech will address the "impedance matching" of network-specific middleware to different transport technologies. It will provide expertise in middleware-based data filters to the project. ORNL will provide overall coordination of the project, and will also provide the technologies for dedicated-channel reservation and provisioning, and optimized transport methods. PNNL will generalize and extend the remote instrument control software it is currently developing for remote control of its confocal microscope facility. SLAC will bring its expertise in network monitoring to the project, with a specific emphasis on endto-end monitoring and dynamic matching of applications to network characteristics. UC Davis will focus on optimizing remote applications by distributing component functions, and they will particularly concentrate on remote visualization tasks. This center consists of subject-area experts from national laboratories and universities with extensive research and practical expertise in networking as well as in enabling applications and middleware to make optimal use of provisioned network capabilities. In particular, several PIs are currently active participants in SciDAC, NSF and other enabling technology projects. The technical focus areas of CANTIS are: (a) high performance data transport for file and memory transfers, (b) effective support of visualization streams over wide-area connections, (c) computational monitoring and steering over network connections with an emphasis on leadership-class applications, (d) remote monitoring and control of instruments including microscopes, and (e) higher-level data filtering for optimal application-network performance. 4 3. Preliminary Studies (5pages – half page each institution) We present a brief (albeit ad hoc and partial) account of our work in various AMNTI technologies that are at various stages of development and deployment. These technologies contribute to varying degrees and can potentially be integrated to realize the needed network capabilities. While several of these works have been motivated primarily by our existing collaborations with TSI, the underlying technologies are applicable to much a wider class of large-scale SciDAC science applications [R05.2]. These methods together represent the building blocks that may be integrated to effectively carry out a SciDAC computation on a supercomputer while being monitored, visualized and steered by a group of geographically dispersed domain experts. 2.1. TSI and SciDAC Applications – ORNL and others 2.2. Network Provisioning – UCD, ORNL It is generally believed that networking demands of large-scale science can be effectively addressed by providing on-demand dedicated channels of the required bandwidths directly to end users or applications. The UltraScience Net (USN) [4] is commissioned by the U. S. Department of Energy (DOE) to facilitate the development of these constituent technologies specifically targeting the large-scale science applications carried out at national laboratories and collaborating institutions. Its main objective is to provide developmental and testing environments for a wide spectrum of the technologies that can lead to production-level deployments within the span of next few years. Compared to other testbeds, USN has a much larger backbone bandwidth (20-40 Gbps) and larger footprint (several thousands of miles), and a close proximity to several DOE facilities. USN provides on-demand dedicated channels: (a) 10Gbps channels for large data transfers, and (b) high-precision channels for fine control operations. User sites can be connected to USN through its edge switches, and can utilize the provisioned dedicated channels during the allocated time slots. In the initial deployment, its data-plane consists of dual 10 Gbps lambdas, both OC192 SONET and 10GigE WAN PHY, of several thousand miles in length from Atlanta to Chicago to Seattle to Sunnyvale. The channels are provisioned on-demand using layer-1 and layer-2 switches in the backbone and Multiple Service Provisioning Platforms (MSPP) at the edges in a flexible configuration using a secure control-plane. In addition, there are dedicated hosts at the USN edges that can be used for testing middleware, protocols, and other software modules that are not site specific. The control-plane employs a centralized scheduler to compute the channel allocations and a signaling daemon to generate configuration signals to switches. This control plane is implemented using a hardware based VPN that encrypts the control signals and provides authenticated and authorized access. Circuit-switched High-speed End-to-End Transport ArcHitecture (CHEETAH) [5] is an NSF project to develop and demonstrate a network infrastructure for provisioning dedicated bandwidth channels and the associated transport, middleware and application technologies to support large data transfers and interactive visualizations needed for eScience applications, particularly TSI. Its footprint spans North Carolina State University (NCSU) and Oak Ridge National Laboratory (ORNL) with a possible extension to University of Virginia and City University of New York. For long haul connections, CHEETAH provides Synchronous Optical Network (SONET) channels through MSPPs that are interconnected by OC192 links. The channels are set up using off-the-shelf equipment, such as MSPPs and SONET switches using Generalized Multiple Protocol Label Switching (GMPLS) for dynamic circuit setup/release. 2.3. Host Performance Issues – FNL 2.4. Network Measurement – SLAC 2.5. Data Transport Methods – ORNL, BNL While USN is being rolled out, we conducted preliminary experiments to understand the properties of dedicated channels using a smaller scale connection from Oak Ridge to Atlanta. Despite the 5 limited nature of this connection, several important performance considerations have been revealed by these experiments. We describe these results here due to their relevance in utilizing the channels that will be provided by USN and CHEETAH. We tested several of UDP-based protocols over 1Gbps dedicated channel implemented as a layer-3 loop from ORNL to Atlanta back to ORNL. In all these cases finer tuning of parameters was needed to achieve throughputs of the order 900Mbps. This tuning required intricate knowledge of their implementation details, which became a very time consuming and somewhat unstructured process. Furthermore, it was unclear as to how to increase the throughput beyond 900Mbps which required fine tuning of host-related parameters. In response, we implemented the Hurricane protocol with tunable parameters for rate optimization (in approximately the same time needed to tune these protocols). Hurricane’s overall structure is quite similar to other UDP-based protocols, in particular UDT. However, certain unique ad hoc optimizations are incorporated into the protocol. Hurricane is mainly designed to support high-speed data transfers over channels with dedicated capacities, and hence does not incorporate any congestion control. We first collected measurements to build the throughput and loss profiles of the channel. The low loss rate at which high throughput is achieved motivated NACKbased scheme since a large number of ACKS would otherwise consume significant cpu time thereby effecting the sending and receiving processes. The initial source sending rate in Hurricane is derived from the throughput profile of the channel, and is further tuned manually to achieve about 950Mbps. Additional throughput increase is achieved by grouping the NACKs and manually obtaining optimal group size, which resulted in 991Mbps throughput. 2.6. Connections to Supercomputers – ORNL Supercomputers present networking challenges that are not addressed by the networking community mainly due the complexity of data and execution paths inside those machines and interconnections. The ORNL-NCSU channel from Cray X1 supercomputer at ORNL to a node at NCSU is provisioned by policy to allow 400 Mbps ows from a single application during certain periods. This connection has a peak rate of 1Gbps due to Ethernet and FiberChannel segments. But, the data path is complicated. Data from a Cray node traverses System Port Channel (SPC) channel and then transits to FiberChannel (FC) connection to CNS (Cray Network Subsystem). Then CNS converts FC frames to Ethernet LAN segments and sends them onto GigE NIC. These Ethernet frames are then mapped at ORNL router onto SONET long-haul connection to NCSU; then they transit to Ethernet LAN and arrive at the cluster node via GigE NIC. Thus the data path consists of a sequence of different segments: SPC, FC, Ethernet LAN, SONET long-haul, and Ethernet LAN. The default TCP over this connection achieved throughputs of the order 50 Mbps. Then bbpc protocol adapted for Cray X1 achieved throughputs in the range 200-300Mbps depending on the traffic condition. Hurricane protocol tuned for this connection consistently achieved throughputs of the order 400Mps [6]. Since the capacity of Cray X1's NIC is limited to 1Gbps, we developed an interconnection configuration to yield network throughputs higher than 1 Gbps [8]. In place of its native CNS we utilized USN-CNS (UCNS) which is a dual Opteron host containing two pairs of PCI-X slots; it is equipped with two FC (Emulex 9802DC) cards each with two 2Gbps FC ports, and a Chelsio 10GigE NIC. A similar host with a 10GigE NIC is used as a data sink. Then channel bonding was disabled on UCNS, and parallel streams were sent through the individual channels. In this configuration, memory-to-memory transfers reached 3Gbps for writes and 4.8Gbps for reads between Cray X1 and UCNS. In summary, these are among the highest throughputs reported from Cray X1 to external hosts over local and wide-area connections. 2.7. Optimal Network Realizations of Visualization Pipelines – UCD Remote visualization is considered a critical enabling technology for a number of large-scale scientific computations that involve visualizing large datasets on storage systems using remote high-end visualization clusters. Such systems of different types and scales have been a topic of focused research for many years by the visualization community. In general, a remote visualization system forms a pipeline consisting of a server at one end holding the data set, and a client at the other end providing rendering and display. In between, zero or more hosts perform a variety of intermediate processing and/or caching and 6 prefetching operations. A wide area network typically connects all the participating nodes. The goal is to achieve interactive visualization on the client-end without transferring the whole dataset to it. For large datasets such transfers may not be practical for a variety of reasons: (i) client may not have the adequate processing capability and/or storage, and (ii) connection bandwidth may not be adequate to ensure reasonable transfer times. In general, such remote visualization tasks require expertise in different areas including high performance computing, networking, storage and visualization. We developed an approach to dynamically decompose and map a visualization pipeline onto wide-area network nodes for achieving fast interactions between users and applications in a distributed remote visualization environment [7]. This scheme is implemented using modules that implement various visualization and networking subtasks to enable the selection and aggregation of nodes with disparate capabilities as well as connections with varying bandwidths. We estimated the transport and processing times of various subtasks, and present a polynomial time algorithm to compute a decomposition and mapping that achieves minimum end-to-end delay. Our experimental results based on an implementation deployed at several geographically distributed nodes illustrated the efficiency of system in terms of utilizing various nodes. 2.8. Microscope Control over Wide-Area Connections – PNNL 2.9. Application-Middleware Filtering – GaTech 7 3. Research Design and Methods (10 pages – 1 page each institution) 3.1. Liaison and Application Integration Activities 3.2. CANTIS Toolkit – UCD, ORNL 3.3. Component Technical Areas 3.3.1. Channel Profiling and Optimization – UCD, ORNL 3.3.2 IPv4/IPv6 Networking – FNL 3.3.3 Host and Subnet Optimizations – FNL 3.3.4 Network Measurements – SLAC 3.3.5. High Performance Data Transport – ORNL, BNL 3.3.6. Connections supercomputers – ORNL 3.3.7. Optimal Networked Visualizations – UCD 3.3.8. Computational Monitoring and Steering –ORNL 3.3.9. Remote Instrument Control –PNNL 3.3.10. Application-Middleware Optimizations – GaTech 3.4. Research Plan The center members will participate in weekly telecoms and bi-annual face to face meetings. A website will facilitate the dissemination of software and tools as well as will provide an alternate mechanism for a scientist to initiate communication with the center on specific AMN tasks. 3.5 Software Lifecycle 8 4. Consortium Arrangements (one page) 9 Literature Cited [C05] S. M. Carter. Networking the national leadership computing facility. In Proceedings of Cray Users Group Meeting. 2005. [N01] NSF Grand Challenges in eScience Workshop, 2001. Final Report: http://www.evl.uic.edu/activity/NSF/final.html. [T] Terascale supernova initiative. http://www.phy.ornl.gov/tsi. [R06] N. S. V. Rao, W. R. Wing, S. M. Carter, Q. Wu, High-speed dedicated channels and experimental results with hurricane protocol, Annals of Telecommunications, 2006, in press. [R05.1] N. S. V. Rao, W. R. Wing, S. M. Carter, Q. Wu, UltraScience Net: Network testbed for large-scale science applications, IEEE Communications Magazine, November 2005, vol. 3, no. 4, pages, S12-17. expanded version available at www.csm.ornl.gov/ultranet. [R05.2] N. S. Rao, S. M. Carter, Q.Wu, W. R. Wing, M. Zhu, A. Mezzacappa, M. Veeraraghavan, J. M. Blondin, Networking for large-scale science: Infrastructure, provisioning, transport and application mapping, Proceedings of SCiDAC Meeting , 2005. [Z05] X. Zheng, M. Veeraraghavan, N. S. V. Rao, Q. Wu, and M. Zhu. CHEETAH: Circuit-switched high-speed end-to-end transport architecture testbed. IEEE Communications Magazine, 2005. [9] N. S. V. Rao, J. Gao, and L. O. Chua. On dynamics of transport protocols in wide-area internet connections. In L. Kocarev and G. Vattay, editors, Complex Dynamics in Communication Networks. 2005. [w02] High-performance networks for high-impact science, 2002. Report of the High-Performance Network Planning Workshop, August 13-15, 2002, http://DOECollaboratory.pnl.gov/meetings/hpnpw. [w03.1] DOE Workshop on Ultra High-Speed Transport Protocols and Network Provisioning for Large-Scale Science Applications, April 2003, Argone National Laboratory. [W03.2] DOE Science Networking: Roadmap to 2008 Workshop, June 3-5, 2003, Jefferson Laboratory [W05] Q. Wu and N.S.V. Rao. Protocols for high-speed data transport over dedicated channels. In Third International Workshop on Protocols for Fast Long-Distance Networks. 2005. [Z04] M. Zhu, Q. Wu, N. S. V. Rao, and S. S. Iyengar. Adaptive visualization pipeline decomposition and mapping onto computer networks. In Third International Conference on Image and Graphics. 2004. 10 Budget and Budget Explanation Budget Summary: ($K) Institution Year 1 Brookhaven National Laboratory 400 Fermi National Accelerator Laboratory 400 Georgia Institute of Technology 300 Pacific Northwest National Laboratory 400 Oak Ridge National Laboratory 600 Sanford Linear Accelerator Center 400 University of California, Davis 300 Total 2,800 Year 2 400 400 300 400 600 400 300 2,800 11 Year 3 400 400 300 400 600 400 300 2,800 Year 4 400 400 300 400 600 400 300 2,800 Year 5 400 400 300 400 600 400 300 2,800 Total 2,000 2,000 1,500 2,000 3.000 2,000 1,500 14,000 Other Support of Investigators Nageswara S. Rao Support: Current Project/Proposal Title: Enabling supernova computations by integrated transport and provisioning Source of Support: Department of Energy Total Award Amount: $550,000 Total Award Period Covered: 01/08/04 - 09/30/06 Location of Project: ORNL Person-Months Per Year Committed to the Project. Cal: 2.0 Support: Current Project/Proposal Title: Collaborative Research: End-to-end Provisioned Network Tested for Large-Scale eScience Applications Source of Support: National Science Foundation Total Award Amount: $ 850,000 Total Award Period Covered: 01/01/04 - 09/30/06 Location of Project: ORNL Person-Months Per Year Committed to the Project. Cal:2.0 Support: Current Project/Proposal Title: ORNL Ultranet Testbed Source of Support: Department of Energy Total Award Amount: $ 4,500,000 Total Award Period Covered: 07/01/03 - 09/30/06 Location of Project: ORNL Person-Months Per Year Committed to the Project. Cal: 2.0 Support: Current Project/Proposal Title: Advanced network capabilities for terascale computations on leadershipclass computers Source of Support: ORNL LDRD Total Award Amount: $360,000 Total Award Period Covered: 01/10/04 - 09/30/06 Location of Project: Laboratory Directed R&D Fund Person-Months Per Year Committed to the Project. Cal: 2.0 12 Biographical Sketches Steven M. Carter, Oak Ridge National Laboratory Les Cottrell, Sanford Linear Accelerator Center Matt Crawford, Fermi National Accelerator Laboratory Dipak Ghosal, University of California, Davis Tom McKenna, Pacific Northwest National Laboratory Biswanath Mukherjee, University of California, Davis Nageswara S. Rao, Oak Ridge National Laboratory Karsten Schwan, Georgia Institute of Technology William R. Wing, Oak Ridge National Laboratory Qishi Wu, Oak Ridge National Laboratory Dantong Yu, Brookhaven National Laboratory 13 Nageswara S. Rao Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, TN 37831-6016 [email protected] EDUCATION Ph.D. Computer Science, Louisiana State University, 1988 M.S. Computer Science, Indian Institute of Science, Bangalore, India, 1984 B.S. Electronics Engineering, Regional Engineering College, Warangal, India, 1982 PROFESSIONAL EXPERIENCE 1. Distinguished Research Staff (2001-present), Senior Research Staff Member (1997-2001), Research Staff Member (1993-1997), Intelligent and Emerging Computational Systems Section, Computer Science and Mathematics Division, Oak Ridge National Laboratory. 2. Assistant Professor, Department of Computer Science, Old Dominion University, Norfolk, VA 23529- 0162, 1988 – 1993; Adjunct Associate Professor, 1993 - present. 3. Research and Teaching Assistant, Department of Computer Science, Louisiana State University, Baton Rouge, LA, 1985 - 1988. 4. Research Assistant, School of Automation, Indian Institute of Science, India, 1984 - 1985. HONORS AND AWARDS Best paper award, Innovations and Commercial Applications of Distributed Sensor Networks Symposium, 2005, Washington. Special Commendation for Significant Contributions to Network Modeling and Simulation Program, Defense Advanced Research Projects Agency. UT-Battelle Technical Accomplishment Award, 2000. Research Initiation Award of National Science Foundation, 1991-1993. CURRENT RESEARCH INTERESTS He published more than 200 research papers in a number of technical areas including the following. 1. Measurement-based methods end-to-end performance assurance over wide-area networks: He rigorously showed that measurement-based methods can be used to provide probabilistic end-toend delay guarantees in wide-area networks. This work is funded by NSF, DARPA and DOE. 2. Wireless networks: He is a co-inventor of the connectivity-through-time protocol that enables communication in adhoc wireless dynamic networks with no infrastructure. 3. Sensor Fusion: Developed novel fusion methods for combining information from multiple sources using measurements without any knowledge of error distributions. He proposed fusers that are guaranteed to perform at least as good as best subset of sources. This work is funded by DOE, NSF and ONR. SELECTED RECENT PUBLICATIONS N. S. V. Rao, W. R. Wing, S. M. Carter, Q. Wu, High-speed dedicated channels and experimental results with hurricane protocol, Annals of Telecommunications, 2006, in press. N. S. V. Rao, W. R. Wing, S. M. Carter, Q. Wu, UltraScience Net: Network testbed for large-scale science applications, IEEE Communications Magazine, November 2005, vol. 3, no. 4, pages, S12-17. 14 N. S. V. Rao, J. Gao, L. O. Chua, On dynamics of transport protocols in wide-area Internet connections, in Complex Dynamics in Communication Networks, L. Kocarev and G. Vattay (editors), 2005. X. Zheng, M. Veeraraghavan, N. S. V. Rao, Q. Wu, and M. Zhu. CHEETAH: Circuit-switched highspeed end-to-end transport architecture testbed, IEEE Communications Magazine, 2005. J. Gao, N. S. V. Rao, J. Hu, J. Ai, Quasi-periodic route to chaos in the dynamics of Internet transport protocols, Physical Review Letters, 2005. N. S. V. Rao, D. B. Reister, J. Barhen, Information fusion methods based on physical laws, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 1, 2005, pp. 66-77. J. Gao, N. S. V. Rao, TCP AIMD dynamics over Internet connections, IEEE Communications Letters, vo. 9, no. 1, 2005, pp. 4-6. N. S. V. Rao, W. C. Grimmell, Y.C. Bang, S. Radhakrishnan, On algorithms for quickest paths under di_erent routing modes, IEICE Transactions on Communications, vol. E87-B, no. 4, 2004, pp. 10021006. N. S. V. Rao, Q. Wu, S. S. Iyengar, On throughput stabilization of network transport, IEEE Communications Letters, vol. 8, no. 1, 2004, pp. 66-68. N. S. V. Rao, Probabilistic quickest path algorithm, Theoretical Computer Science, vol. 312, no. 2-3, pp. 189-201, 2004. N. S. V. Rao, Overlay Networks of In-Situ Instruments for Probabilistic Guarantees on Message Delays in Wide-Area Networks, IEEE Journal on Selected Areas of Communications, vol. 22, no. 1, 2004. N. S. V. Rao, Y. C. Bang, S. Radhakrishnan, Q.Wu, S. S. Iyengar, and H. Cho, NetLets: Measurementbased routing daemons for low end-to-end delays over networks, Computer Communications, vol. 26,no. 8, 2003, pp. 834-844. W. C. Grimmell and N. S. V. Rao, On source-based route computation for quickest paths under dynamic bandwidth constraints, International Journal of Foundations of Computer Science, vol. 14, no. 3, pp. 503523, 2003. S. Radhakrishnan, G. Racherla, N. Sekharan, N. S. V. Rao, S. G.Batsell, Protocol for dynamic ad-hoc networks using distributed spanning trees, Wireless Networks, vol. 9, pp. 673-686, 2003. J. B. Gao, W. W.Tung, N. S. V. Rao, Noise induced Hopf bifurcation-like sequence and transition to chaos in Lorenz equations, Physical Review Letters, Dec 2002. N. S. V. Rao, NetLets for end-to-end delay minimization in distributed computing over Internet using two-Paths, International Journal of High Performance Computing Applications, 2002, vol. 16, no. 3, 2002. COLLABORATIONS AN OTHER AFFILIATIONS Senior Member, Institute of Electrical and Electronics Engineering (IEEE) Member, Association of Computing Machinery (ACM) Postgraduate Advisor: S. Ding (Louisiana State University); M. Zhu (Louisiana State University); Q. Wu (Louisiana State University); Postdoctoral Advisor: Q. Wu (Computer Science and Mathematics Division) 15 Description of Facilities A. ORNL NLCF B. UltraSceinceNet and CHEETAH C. PNNL Microscopes D. Lambda Station 16 Appendix 1: Institutional Cover (DOE Order 5700.7C) and Budget (DOE F 4620.1) Pages 1. Brookhaven National Laboratory 2. Fermi National Accelerator Laboratory 3. Georgia Institute of Technology 4. Oak Ridge National Laboratory 5. Pacific Northwest National Laboratory 6. Sanford Linear Accelerator Center 7. University of California, Davis 17 Appendix 2: Institutional Tasks and Milestones and Deliverables (2 pages each) 1. Brookhaven National Laboratory 2. Fermi National Accelerator Laboratory 3. Georgia Institute of Technology 4. Oak Ridge National Laboratory 5. Pacific Northwest National Laboratory 6. Sanford Linear Accelerator Center 7. University of California, Davis 18 4. Oak Ridge National Laboratory: Tasks and Milestones (2 pages) ORNL tasks are grouped in three separate categories: 1. Overall Project Coordination: 2. Liaison to Astrophysics, Climate and Combustion Areas: 3. Technical Areas: The following technical tasks will be carried out by ORNL: (a) high performance data transport protocols for dedicated channels, (b) effective support of visualization streams over wide-area connections, (c) computational monitoring and steering over network connections with an emphasis on leadership-class applications, Milestones: 19 Appendix 3: Letters of Support Petascale/Terascale Supernova Initiative – Anthony Mezzacappa, Oak Ridge National Laboratory Combustion – Jackie Chan, Sandia National Laboratory 20 Organizational Approach: This center consists of experts from national laboratories and universities with extensive research and practical expertise in networking areas as well as in enabling the applications and middleware to optimally utilize the provisioned networking capabilities. In particular, several PIs are currently active participants in network enabling tasks for a number of SciDAC and NSF science application projects. The center will focus on networking capabilities needed exclusively by SciDAC science applications with an emphasis on high-performance environments of supercomputers and experimental facilities. The center is based on the concepts of Technology Experts who will address specific technical areas, and Science Liaisons who will interface and coordinate with SciDAC scientists. It will leverage existing network enabling tools and also develop the missing technologies to continue to provide state-of-the-art cross-cutting tools applicable across SciDAC applications. In addition, the science liaisons actively engage scientists in their assigned areas, through regular meetings and interactions, to help the center to keep pace with the SciDAC requirements and to seek the current network enabling solutions. Technology experts continuously monitor and leverage the available networking technologies, and help the center identify and develop the “gap” technologies. They will test and integrate all these technologies into cross-cutting profiling and optimizing tools that can be utilized across science projects, and also maintain them at a repository. The center will serve as a one-stop resource for any SciDAC scientist with a network enabling task. Each such request will be assigned a single science liaison who would interface with the appropriate experts to work closely to: (i) identify the technology components, (ii) develop a comprehensive network enabling solution, and (iii) install, test and optimize the solution in place. Each participant institution of this center is assigned 1-2 science areas to act as a liaison based on their prior and on-going relationships with science projects. Also each institution is assigned specific technical network enabling areas. The center members participate in weekly telecoms and bi-annual meetings. A website will facilitate the dissemination of software and tools as well as will provide a mechanism for a scientist to initiate a communication with the center on a specific network enabling task. 21 Summary. SciDAC large-scale computations and experiments require unprecedented network capabilities in the form of large bandwidth and dynamically stable connections to support large data transfers, network-based visualizations, computational monitoring and steering, and remote instrument control. The applications must be network enabled which requires: (i) provisioning of network connectivity of appropriate capacity and identifying and tuning of the protocols, and (ii) customizing and optimizing the applications and middleware so that the provisioned capacities are made available to the applications in a transparent manner without performance degradations. The phrase Network Enabling (NE) collectively refers to the technologies needed for accomplishing both tasks (i) and (ii), which is significantly beyond the solution space addressed by traditional networking or middleware areas. Current limitations on network capabilities have proven to be a serious drawback in a number of SciDAC applications in accessing datasets, visualization systems, supercomputers and experimental facilities. In particular, current network enabling technologies are inadequate in meeting the high-performance capabilities needed for the transfer of tera-peta byte datasets, stable and responsive steering of super-computations, and controlling experiments on remote user facilities. These network capabilities represent a multi-faceted challenge with the needed component technologies ranging from provisioning to transport to network-application mappings, which must be developed, tested, integrated and/or optimized within the SciDAC science environments. Currently, such efforts have been scattered and/or duplicated across a number of SciDAC projects, often placing an undue burden on the application scientists to become experts in network enabling technologies. The main goal of this center is to serve a comprehensive resource in providing the highperformance network capabilities to SciDAC applications in an optimal and transparent manner. This project addresses a broad spectrum of networking-related capabilities all needed in various SciDAC applications by leveraging existing technologies and tools, and developing the missing ones. We propose to develop NE profiling tools specifically tailored to SciDAC that will generate the connection profiles at application-level, and identify potential components of an end-to-end NE solution. We will develop in-situ optimization tools that can be dropped in place along with the application to identify the optimal NE configurations such as optimal number of TCP streams for application-to-application transfers, and decomposition and mapping of modules of a visualization pipeline. We will also develop APIs and libraries customized to science applications to provide them the NE technologies in a transparent manner. These tools are cross-cutting in nature in that they could be used across a wide spectrum of applications. However, they may not be necessarily optimal or optimally configured within a specific science application environment. To accomplish such finer optimization, appropriate team members will work closely with SciDAC scientists to design, customize and optimize NE technologies within the specific application environments. In terms of technical areas, this center will addres NE technologies for: (a) high performance data transport both for file and memory transfers, (b) effective support of visualization streams over wide-area connections, (c) computational monitoring and steering over network connections with a particular emphasis on supercomputers, (d) remote monitoring and control of instruments including microscopes, and (e) higher-level data filtering to optimize network performance at the application level. 22