As the scientific software community looks toward platforms in the 50-100 Pflop/s range, the characteristics of the design space it confronts - forced reductions in processor clock speeds, new power constraints, anemic improvement in communication latencies, exponential escalation in the number of computing elements, and the revolutionary increase in component heterogeneity - raise a host of difficult and unsolved problems. To create software capable of extracting a significant percentage of theoretical peak performance on systems at extreme scale, the scientific community will need groundbreaking innovations in their algorithms, as well as in their ability to control massive parallelism in multiple dimensions of the software architecture.

In this talk I introduce several concepts and techniques evolving around flexible runtimes targeted at efficient management of fine-granularity task dependencies and their scheduling on dynamic heterogeneous resources. Domain specific languages and programming paradigms developed on top of this runtime will be presented and exemplified, highlighting the adaptability of the runtime for problems generated from diverse scientific fields.