Making Resilience a Reality Through a Resilience Consortium
James Elliott, Louisiana Tech University

The study of large scale systems is challenging and attempting to draw objective conclusions is even more difficult. To better understand these systems and provide meaningful information to the entire HPC community some basic guidelines should be defined. From the data in the logfiles to the reports presented, a standard set of terminology and metrics with unified semantics should be introduced. There should also be cohesion among the various researchers and industry personnel to ensure that resilience research continues to grow. To initiate this process a consortium of researchers and industry personnel has been formed. This talk will highlight some of challenges encountered performing reslience research, and how we plan to address them through the resilience consortium.

James Elliott is a PhD student at Louisiana Tech University studying under Dr. Box Leangsuksun. His interests lie in modelling and analyzing resilience mechanisms at various levels of the software stack.

