Abstract
The study of large scale systems is challenging and attempting to draw
objective conclusions is even more difficult. To better understand these
systems and provide meaningful information to the entire HPC community
some basic guidelines should be defined. From the data in the logfiles to
the reports presented, a standard set of terminology and metrics with
unified semantics should be introduced. There should also be cohesion
among the various researchers and industry personnel to ensure that
resilience research continues to grow. To initiate this process a
consortium of researchers and industry personnel has been formed. This
talk will highlight some of challenges encountered performing reslience
research, and how we plan to address them through the resilience
consortium.
Bio
James Elliott is a PhD student at Louisiana Tech University studying under
Dr. Box Leangsuksun. His interests lie in modelling and analyzing
resilience mechanisms at various levels of the software stack.