High-Performance Networking for Large-Scale Science: Infrastructure, Provisioning, Transport and Application Mapping

Nageswara S. Rao, William R. Wing, Steven M. Carter, Qishi Wu, Mengxia Zhu, Anthony Mezzacappa, Oak Ridge National Laboratory
Malathi Veeraraghavan, University of Virginia
John Blondin, North Carolina State University

Large-scale science computations and experiments require unprecedented networking capabilities in terms of large bandwidth and dynamically stable connections to carry out data transfers, interactive visualizations, and monitoring and steering operations. Such high-performance network demands arise in a wide spectrum of disciplines including high energy physics, genomics, climate computations and astrophysics. A number of component technologies including infrastructures, provisioning, transport and application mappings must be developed and/or optimized to achieve such networking capabilities. We describe data- and control-planes of DOE UltraScienceNet and NSF CHEETAH network testbeds that provide on-demand and scheduled dedicated network connections. In contrast to current IP networks, these connections provide both large (multiple 10s Gbps) bandwidths as well as stable and impeded channels with no competing traffic. We then describe a scheme that optimally maps a visualization pipeline onto a network to minimize the end-to-end delays. This scheme overcomes the sub-optimal performance of conventional visualization methods that employ monolithic network mappings typically in a client-server configuration. We describe some experimental results on transport protocols that achieve close to 100% utilization on dedicated 1Gbps wide-area channels. We also describe an interconnect configuration that provides multiple Gbps channels from Cray X1 to external hosts; we present data transport methods that achieve multiple Gbps rates over such channels. These methods together represent the building blocks that may be integrated to effectively carry out a Terascale Supernova computation on a supercomputer while being monitored, visualized and steered by a group of geographically dispersed domain experts.