Data Management on the Fusion Computational Pipeline

S. Klasky1, M. Beck3, V. Bhat1,2, E. Feibush1, B. Ludäscher5, M. Parashar2,
A. Shoshani4, D. Silver2, M. Vouk6

1Princeton Plasma Physics Laboratory
2Rutgers University
3University of Tennessee
4Lawrence Berkeley National Laboratory
5U.C. Davis
6N.C. State

Fusion energy science, like all of the other science areas in DOE, is becoming increasingly data intensive and network distributed. In this talk, I discuss data management techniques that are essential for scientists making discoveries from their simulations and experiments, with special focus on the techniques and support that Fusion Simulation Project (FSP) scientists may need. However, the discussion applies to a broader audience since most of the fusion SciDAC’s, and FSP proposals include a strong data management component. Simulations on ultra scale computing platforms imply an ability to efficiently integrate and network heterogeneous components (computational, storage, networks, codes, etc.), and to move large amounts of data over large distances. I discuss the workflow categories needed to support such research as well as the automation and other aspects that can allow an FSP scientist to focus on the science and spend less time tending information technology.