Abstract
Understanding the soft error vulnerability of supercomputer applications
is critical as these systems are using ever larger numbers of devices that
have decreasing feature sizes and, thus, increasing frequency of soft
errors. As many large scale parallel scientic applications use BLAS and
LAPACK linear algebra routines, the soft error vulnerability of these
methods constitutes a large fraction of the applications' overall
vulnerability. This talk analyzes the vulnerability of these routines in
the context of overall application error vulnerabilit y. We develop a
novel technique that uses vulnerability proles of individual routines to
model the propagation of errors through chained invocations of them. We
use our propagation models to assemble vulnerability proles of arbitrary
scientic applications that are primarily composed of calls to BLAS and
LAPACK. We demonstrate that the resulting application vulnerability proles
are highly accurate while having very low overhead.
Bio
Greg Bronevetsky graduated from Cornell University in 2006 under the
direction of Keshav Pingali. He currently holds a Lawrence Post-doctoral
Fellowship at the Lawrence Livermore National Laboratory. Greg's work
focuses on compiler analyses for parallel applications and scalable fault
tolerance techniques.