Participating Institutions: Oak Ridge National Laboratory Virginia Tech. |
End-User Data Delivery |
                    People                     Publications                     Testbed                     Positions |
Project Summary High performance computing is facing an exponential growth in job input and output
dataset sizes. Terabytes of reduced, result and snapshot data from experimental facilities (e.g., Spallation
Neutron Source, Large Haddron Collider), collaboratories (e.g., Earth System Grid), state-of-the-art cyberinfrastructure
(e.g., TeraGrid) and supercomputers (e.g., Jaguar, Kraken) needs to be delivered to end-users
or other destinations for local interpretation of results, visualization or for further analysis. Similarly, large
input datasets are required to be staged into HPC centers from end-user locations for consumption by supercomputing
jobs. End-user data services are often an afterthought in multi-million dollar HPC centers and
cyber-infrastructure projects, leading to their sub-optimal use. An elegant data delivery scheme can have a
profound impact on user experience and also improve HPC center serviceability. Extant, point-to-point data
delivery techniques, commonly used in HPC centers, are unable to meet user delivery and job startup deadlines,
unable to adapt to changing dynamics in the end-to-end data path and are not fault-tolerant. Further,
these transfer tools are only optimized for transfers between two already well-endowed sites. In contrast,
end-user data delivery involves providing access to the data at the user.s desktop. It cannot be ignored as a
.last-mile. issue.
We propose a robust framework for end-user data services that achieves the timely, decentralized delivery
of terabytes of data, addressing the aforementioned significant gaps in an HPC center.s data solution.
The overarching goal of this proposal is to design tools and technologies that enable quick and efficient
utilization of resources available in the end-to-end data path, between the user and the HPC center, and
bring them to bear on the, often overlooked, end-user data delivery problem. We propose a novel approach
that transforms an existing collaboration (e.g., a virtual organization) of an end-user into an intermediate,
trusted storage overlay, which is then used to transfer large data in a decentralized fashion. Our research will
have a significant impact on modern HPC centers, cyber-infrastructure projects and a variety of scientific
experimental facilities and their user bases by fundamentally transforming the end-user data experience.
Intellectual Merit The scientific value and innovation of this work can be summarized by the following
research objectives. In
building an end-user delivery framework, we propose to design, develop and evaluate the following:
NEW!: ORNL: Internships available all through the year