Accurate Prediction of Soft Error Vulnerability of Scientific Applications
Greg Bronevetsky, Lawrence Livermore National Laboratory

Understanding the soft error vulnerability of supercomputer applications is critical as these systems are using ever larger numbers of devices that have decreasing feature sizes and, thus, increasing frequency of soft errors. As many large scale parallel scientic applications use BLAS and LAPACK linear algebra routines, the soft error vulnerability of these methods constitutes a large fraction of the applications' overall vulnerability. This talk analyzes the vulnerability of these routines in the context of overall application error vulnerabilit y. We develop a novel technique that uses vulnerability proles of individual routines to model the propagation of errors through chained invocations of them. We use our propagation models to assemble vulnerability proles of arbitrary scientic applications that are primarily composed of calls to BLAS and LAPACK. We demonstrate that the resulting application vulnerability proles are highly accurate while having very low overhead.

Greg Bronevetsky graduated from Cornell University in 2006 under the direction of Keshav Pingali. He currently holds a Lawrence Post-doctoral Fellowship at the Lawrence Livermore National Laboratory. Greg's work focuses on compiler analyses for parallel applications and scalable fault tolerance techniques.

Workshop Index