February 27, 2018- Dr. Brian Nord: AI in the Sky: The Application of Artificial Intelligence to Cosmological Questions

Abstract: The increased availability of large data sets and advancements in artificial intelligence (AI) algorithms have revolutionized the role of data across industry, society, and the sciences. In the last few years, it has had substantial impact on molecular chemistry, particle physics, and more recently astronomy. AI (e.g., machine learning) is more than likely here to stay, and it has the potential to transform our approach to modeling cosmological and astrophysical data. But, what are these algorithms doing, and what are the critical barriers to enabling their highest impact on science?

We'll discuss these topics in the context of modern astronomical surveys, which provide data sets that are unprecedented in size, precision, and complexity. First, I'll describe the key questions in modern cosmology. Then, we'll discuss recent work on the application of convolutional neural networks to strong gravitational lensing, cosmic microwave background, and cosmological simulations point to the long-term promise for deep learning and its utility in answering fundamental questions about the universe.

Bio: Brian Nord received his Ph.D. in physics at the University of Michigan and now works as a post-doctoral researcher at the Fermi National Accelerator Laboratory, focusing on strong gravitational lensing in the Dark Energy Survey (DES). He is also interested in cosmological survey design through end-to-end simulations, galaxy clusters structure, and spectroscopic analyses of many phenomena, such as Milky Way satellites. A few times per year, he travels to Chile to observe for DES on the Blanco Telescope at the Cerro Tololo Inter-American Observatory, or to follow up gravitational lenses at Gemini South. He likes to pull in techniques, ideas, and ways of thinking from multiple fields to tackle hard problems. Moreover, Nord embraces diverse perspectives: science is a human endeavor, and it is critical we appreciate the role that each of us plays in re-imagining our future together.

February 20, 2018- Jay Jay Billings: ORNL's Scientific Software Initiative: How to respond when your boss says "Go create a software community!"

Abstract: A number of years ago I had the distinct pleasure of hearing my boss say "Go create a software community!" It wasn't clear to either of us at the time what this meant or what the road to success would look like. We couldn't even - and still haven't! - assign a successful deadline for completion. We also didn't have a good grasp on the size of the community. As a final challenge, communities are organically grown based on trust, empowerment, shared consciousness and any number of other factors, which is to say quite unlike rigid, rectilinear line organizations. None of this uncertainty was frightening, but at the same time none of it was particularly encouraging for our prospects.

This talk examines the creation of ORNL's Scientific Software Initiative which seeks to build a community for ORNL's software engineers, develop new metrics and attribution models for assigning credit for research software engineering activities, provide training and continuing education opportunities, and to take on strategic software projects and proposals when possible. I will discuss the origins of the SSI, lots of interesting happenings to date and presently - including the new ORNL Software Portal, the return of ORNL's Software Expo, the @ORNLSoftware twitter channel, etc. - and the future of the Initiative in terms of both the community at large and the line organization behind it: the Scientific Software Development Team within the Computer Science and Mathematics Division. I will close the talk with a brief description of the SSI's work outside of ORNL and provide details on how to engage for those interested.

February 9, 2018- Dr. Yun (Helen) He: Preparing Users for Cori KNL and NERSC Exascale Scientific Application Program (NESAP)

Abstract: The newest NERSC supercomputer Cori is a Cray XC40 system consisting of 2,388 Intel Xeon Haswell nodes and 9,688 Intel Xeon ­Phi "Knights Landing" (KNL) nodes. Compared to the Xeon­ based clusters NERSC users are familiar with, optimal performance on Cori requires consideration of KNL mode settings; process, thread, and memory affinity; fine­ grain parallelization with thread scaling; vectorization; and use of the high­ bandwidth MCDRAM memory.

We will talk about our efforts preparing NERSC users for Cori KNL including the NERSC Exascale Science Application Program (NESAP), web documentations, and user trainings. We will introduce NESAP, the NERSC Application Readiness effort, its scope and components. General considerations, optimization strategy and tips for KNL in various aspects will be discussed with examples. The Roofline model used extensively as an optimization guide will be described with incremental steps and results from selected NESAP applications. In this talk, we will also discuss how we configured the Cori production system for usability and productivity, addressing heterogeneous programming concerns, batch system configurations, running jobs recommendations, default KNL cluster and memory mode selections, and system issues affecting users. A few successful application stories on KNL will be presented. The CUG2017 paper on these topics was the First Runner-up for the Best Paper Award.

Dr. Yun (Helen) He is a High Performance Computing Consultant at NERSC, Lawrence Berkeley National Laboratory. She serves as the main user point of contact, among users, systems staff, and vendors, for the NERSC flagship systems including the following Cray systems, XT4 (Franklin), XE6 (Hopper), and XC40 (Cori), deployed over the past 10 years. She specializes in the software programming environment, parallel programming paradigms such as MPI and OpenMP, scientific applications benchmarking and optimization, distributed components coupling libraries, and climate models. Helen has been on the Organizing Committee for many HPC conference series, such as CUG, SC, HPCS, IXPUG, and OpenMPCon. She is the Program Chair for CUG 2017 and CUG 2018. Helen also serves on the OpenMP Language Committee to represent Berkeley Lab. Helen worked in the Scientific Computing Group, Computational Research Division at Berkeley Lab before joining NERSC. She has a PhD in Marine Studies and an MS in Computer Information Science.

January 26, 2018- Tarun Prabhu: Just In Time (JIT) Compilation for High Performance Computing (HPC)

Abstract: Modern compilers have to make increasingly complex optimization decisions with incomplete information to accurately estimate the safety and profitability of code transformations. Having access to runtime information such as the trip counts of loops can allow the compiler to optimize the code more aggressively than it otherwise could. Existing approaches to exploiting these such as compiler-directives, profile-guided optimization and compile-and-go techniques suffer from issues such as lack of portability, constant developer involvement or limit the scope of information that can be exploited. JIT compilation can overcome the limitations of these approaches. We show that it is possible to perform JIT compilation profitably in HPC applications using a carefully designed combination of simple programmer annotations and static analysis to enable dynamic optimizations. We describe a JIT compiler called Moya which we use to obtain speedups of up to 63\% on individual subroutines and 8\% overall on a combustion simulation application called PlasComCM. We also show how a JIT compiler can have a positive impact on readability and maintainability of applications by undoing the hand-optimizations on several compute-kernels in PlasComCM and obtaining comparable levels of performance even when JIT'ing "clean" code.

Bio: Tarun Prabhu is a PhD candidate at the University of Illinois at Urbana-Champaign. He is mainly interested in compilers and static analysis, particularly in the context of HPC. His research involves develop tools to improve performance of HPC applications and productivity of HPC developers. His PhD work has focused on using JIT compilation in HPC. He is the creator and lead developer of Moya - an annotation-driven JIT compiler for Fortran, C and C++. He holds a master's degree in computer science from the University of Utah and a bachelor's degree in computer engineering from the University of Mumbai.

December 5, 2017- Koby Hayashi: The CP Decomposition: Efficient Algorithms and Application to Neuro-Imaging

ABSTRACT: Tensor decompositions provide a means of data analysis for multi-dimensional data. In particular, there is a growing interest for using CP decomposition (a generalization of the matrix singular value decomposition) for obtaining low-rank approximation of multi-dimensional data. This low rank approximation of the data can be used in applications such as blind source separation (interpreting each component as a source signal), anomaly detection (identifying data points that are not explained by the model), and for predicting missing or future data. In this talk, I will discuss efficient methods for computing a CP Decomposition via the use of dimension trees that aims at reducing computation on the expensive Matricized Tensor Times Khatri-Rao Product (MTTKRP) at the cost of an extra memory for storing partial Tensor Times Vector(TTV's). In addition, I will briefly present an application of the CP to a dense Neuro-Imaging data set.

December 1, 2017- Drahomira (Dasha) Herrmannova: Mining Scholarly Publications for Research Evaluation and Forecasting

ABSTRACT: Investment of public funds into research requires the ability to clearly demonstrate beneficial returns, accountability, and good management. This creates a need for effective and appropriate research evaluation methods. However, the question of how to evaluate the quality of research outcomes is very difficult to answer and despite decades of research, there is still no standard solution to this problem. In this presentation, I will discuss the concept of research publication quality, the existing research evaluation methods, and our work in this area focused on extending and improving the existing measures. Most importantly, I will present a new class of metrics called semantometrics. In contrast to existing research measures, which are based on interactions in the scholarly communication network, semantometrics utilize publication content. Finally, I will propose that leveraging text-mining to analyze the outputs of research will enable us to support strategic allocations of research funds, build tools for relieving information overload in scholarly publishing, and support and accelerate scientific discovery.

Bio: Dasha Herrmannova is currently a visiting researcher at Oak Ridge National Laboratory and an outgoing doctoral researcher at the Knowledge Media Institute, Open University, United Kingdom. Her doctoral work focused on showing whether and how content can be exploited to develop research evaluation methods that are representative of research publication quality and using this knowledge to improve the process of research evaluation. Within this area, she has co-founded Prior to becoming a PhD student, she worked as a software engineer at Honeywell. Aside from her PhD, she also participated in research projects at the Knowledge Media Institute in the domains of Scholarly Publication Mining and Learning Analytics, successfully competed in international research competitions (NTCIR CrossLink-2, 2016 WSDM Cup Challenge), and organized numerous research workshops to bring together different communities working on related problems in text mining of research publications.

November 1, 2017- Chen Wu: A Data-Driven Approach to Harnessing the Astronomical Data Deluge

ABSTRACT: The Square Kilometre Array (SKA) will be the largest radio telescope in the world. It is to be built from 2018, making it the latest large-scale global scientific endeavor. The first phase of the project — SKA1 — will consist of hundreds of dishes and hundreds of thousands of antennas, enabling the monitoring and surveying of the sky in unprecedented detail and speed, with a second phase expanding these capabilities to at least an order of magnitude. Because of its immense size, just one SKA1 science project will produce "reduced" data at a rate of 1 TB/s from both SKA1- Low and SKA1-Mid. This correlated interferometry data will be fed into the Science Data Processor and extended regional science centres, responsible for processing and reducing the data; and producing and preserving science-ready products continuously. The SKA1 will have constrained power allocations to process observations as they are performed in real time. This poses considerable challenges to manage, process and store such large datasets. In this talk, I will provide an overview of a data-driven approach adopted by the Data-Intensive Astronomy group at the International Centre for Radio Astronomy to tackle these challenges. In particular, I will discuss the motivation, formulation and implementation of this approach in several real-world SKA precursor projects throughout the entire data lifecycle – data ingestion, data processing, data management and data analysis. Finally I will highlight potential collaboration opportunities associated with the SKA project.

Bio: Dr. Chen Wu is a Senior Research Fellow, International Centre for Radio Astronomy Research, at the University of Western Australia. Dr. Wu is leading the DATA layer prototype development of the Science Data Processor (SDP) for the Square Kilometre Array (SKA) radio telescope (, which is set to produce data streams at a rate of tens of terabytes per second. He is also leading the project on Radio Galaxy Detection using Deep Learning, i.e. Faster Region based Convolutional Neural Networks and Spatial Transformer Networks. Dr. Wu was awarded the Distinguished Visiting Fellow by the Scottish Informatics and Computer Science Alliance (St Andrews, UK).

October 30, 2017- Li Tan: Soft Error Resilience and Quantitative Power Efficiency using HPC Application Characteristics

ABSTRACT: The persistently growing resilience and power efficiency concerns of large-scale computing systems today are mainstream pressing issues in high performance computing, due to domain-specific requirements and limited power supply capability of current and projected supercomputers. Since generic fault tolerance and power saving approaches are bounded by efficiency, application-level solutions in certain scenarios are more preferred. For instance, scientific applications within a particular domain generally comply with domain conservation laws, which can be leveraged as an error detection criterion to study the resilience of this domain of applications sharing similar program characteristics. For an application with inherent resilience, more power savings can be achieved by operating in the low-power mode using near-threshold voltage computing techniques, while the resulting errors are masked or tolerated by the application itself.

However, it is challenging to achieve application-level resilience and power efficiency due to: (a) how to utilize the invariants of applications to efficiently detect and recover from failures, (b) how to reduce power of HPC runs online within allowable maximum extent such that quality metrics of applications can be satisfied, and (c) how to identify potential intrinsic nature of application characteristics for resilience and power saving purposes. In this presentation, we discuss these challenges and propose a lightweight failure recovery approach of checksum-based invariant checking and retrying for continuum dynamics applications, and an empirical framework named V-Power to save the most power for inherently resilient applications. Experimental results on a virtualized platform with extensive fault injection campaigns demonstrate the effectiveness and efficiency of the proposed approaches.

October 2, 2017- Dr. Judy Qiu: A High Performance Model-Centric Approach to Machine Learning Across Emerging Architectures

ABSTRACT: The major goal of our project, A High Performance Model-Centric Approach to Machine Learning Across Emerging Architectures (HPMCA-ML), is to perform fundamental computer science research in innovative computation models for big data machine learning algorithms. Our research aims to address "Big Model" machine learning challenges. We select a subset of machine learning algorithms and prototype them with attention to issues governing performance. We explore portability to new architectures including those using Intel Xeon, Xeon Phi manycore (e.g. Haswell, Knight's Landing) and NVidia GPU nodes.

It is important to use Machine Learning (ML) techniques to extract insights from current and future scientific datasets, thereby accelerating scientific discovery effectively alongside with simulations. However, an open research challenge is whether common ML frameworks can be developed for scientific applications and how effectively do they run on computer infrastructures from those being developed for exascale to traditional clusters and cloud platforms. We follow up with this vision and further explore effective approaches to HPC and Big Data convergence.

We expect that future ML will be built around optimized libraries for important kernels and in recent work, we have developed a new paradigm based on distinguishes the data and model components of the algorithm and designing a runtime and programming paradigm that supports this. We identify different communication and synchronization patterns supported in our framework and intend a pattern-based approach where we look at a few key algorithms with different patterns and show how they are supported with detailed performance studies. Later in research we will look at auto-tuning approaches that explore the range of communication and synchronization patterns and optimize choices for new problems.

Bio: Judy Qiu is an Associate Professor of Intelligent Systems Engineering at Indiana University. Her general area of research is in data-intensive computing at the intersection of Cloud and HPC multicore technologies. This includes a specialization on programming models that support iterative computation, ranging from storage to analysis which can scalably execute data intensive applications. Her research has been funded by NSF, NIH, Microsoft, Google, Intel and Indiana University. Judy Qiu leads an Intel Parallel Computing Center (IPCC) site at IU. She was the recipient of an NSF CAREER Award in 2012, Indiana University Trustees Award for Teaching Excellence in 2013-2014, and Indiana University Outstanding Junior Faculty Award in 2015.

September 21, 2017- Dr. Christie Drew: Measuring Research Impact: Approaches and Tools Used at NIEHS

ABSTRACT: Christie's talk will cover approaches in use and in development at the National Institute of Environmental Health Sciences (NIEHS) for measuring extramural research outcomes and impacts. For over ten years, logic models have been used as an organizing framework for understanding research objectives and results, as well as for developing metrics at NIEHS. Christie's talk will include discussion of two emerging tools for measuring and tracking research impact, the High Impacts Tracking System (HITS) and Automated Research Impact Assessment (ARIA).

BIO: Christie Drew, Ph.D., is Chief of the Program Analysis Branch within the NIEHS Division of Extramural Research and Training. She has worked in Environmental Health for more than 26 years, primarily focused on risk communication and information management for decision making. Prior to joining NIEHS, Christie worked with an academic research consortium funded by the Department of Energy to study nuclear waste cleanup. Her dissertation research focused on transparency of decisions at the Hanford site.

August 28, 2017- Dr. Sadaf R. Alam: Creating Abstractions for Piz Daint and its Ecosystem

ABSTRACT: Supercomputing platforms have developed over the years to maximize performance and scalability and to minimize any abstractions that could result in slowdowns and performance variability. Large scale supercomputing systems traditionally tightly integrated hardware and software stacks, for instance, light weight operating systems, vertically integrated communication layers and tightly coupled parallel file systems. Consequently these environments do not lend themselves for portability expected for web and ISV applications as well as emerging data science workflows. In this talk, I will present abstractions that are introduced in Piz Daint, a hybrid and heterogeneous Cray XC50 and XC40 platform with Nvidia P100 GPU devices, in order to enable Large Hadron Collider (LHC) computing grid workflows. This transition is completely transparent to the end users and existing workflows for experiments including ATLAS, CMS and LHCb. The technologies and solutions that have been exploited in creating abstractions include containerization, file system virtualization and other configuration changes. Incidentally, these abstractions have proven to be highly effective in leveraging execution of other complex workflows, for instance, Apache Spark, often with performance and productivity gains. Piz Daint hybrid and heterogeneous supercomputing platform has therefore enabled the Swiss National Supercomputing Centre (CSCS) in consolidating a range of services including computing, data analysis and visualization for its diverse and growing user communities.

Bio: Dr. Sadaf R. Alam is the Chief Architect and Head of the HPC Operations at the Swiss National Supercomputing Centre (CSCS) in Lugano, Switzerland. Dr. Alam studied computer science at the University of Edinburgh, UK, where she received her Ph.D. in 2004. Until March 2009 she was a computer scientist at the Oak Ridge National Laboratory, USA. At CSCS, Dr. Alam leads the unit operating HPC systems for the User Lab and external customers, like MeteoSwiss, the Federal Office for Meteorology and Climatology. In her role as Chief Architect, she ensures the end-to-end integrity of HPC systems and storage solutions. Her research interests include the development of tools and technologies for emerging computing, memory, network and storage technologies.

August 17, 2017- Dr. Dinesh Kaushik: Hierarchical Programming Model for Extreme-Scale Computing

ABSTRACT: Tight power requirements at exascale dictate machines with lower memory (total and per thread of execution) and memory bandwidth (as compared to the machines so far). For this reason, application developers need to adapt at the algorithmic, architectural, and implementation levels. The concurrency in an application needs to be exploited with in a node (which is likely to have about thousand threads of execution) and among nodes. Many codes scale very well in distributed programming model (using MPI). However, scaling within a node has been challenging when thread count goes to hundreds. In addition, the application codes need to adapt to failing machine components and reconfigure automatically to the running parts of the machine. While these challenges will get addressed progressively, the applications can take advantage of the vast computational power, which will be arranged in some hierarchical programming model (perhaps OpenMP or CUDA within a node and MPI in between the nodes). In addition to allowing higher resolutions, the new platforms will enable multiphysics and multiscale applications - making high-fidelity simulation an important tool for scientific discovery by itself and providing insights for doing the right experiments. The analysis tools also need to be more concurrent to handle the large volume of output data from applications codes. This talk will discuss some of these issues in detail. Some recent results on Intel's many integrated core (MIC) architecture in the context of an unstructured CFD code will also be presented.

August 16, 2017- Mark Finlayson: Detecting and Extracting Narrative Discourse Structure in Text

ABSTRACT: The past two decades have seen a dramatic increase in the sophistication, subtlety, and impact of explicit and implicit information operations by adversaries outside of the immediate battlefield: in Afghanistan and Iraq, the U.S. struggled to win hearts and minds; in Ukraine and the recent U.S. presidential election, Russia deployed propaganda and fake news to significant effect; and in its rivalry with China, the U.S. is seeing infringement on its soft power, especially in Africa. All of these problems, and many more, can be seen as failures to "control the narrative". In this talk, I discuss several new computational technologies for detecting, assessing, dissecting, and understanding narratives, applicable to the military domain, but also to health, diplomacy, and disaster management. I first present new results on detection of narratives in online text, which will eventually allow us to detect and counter online propaganda, fake news, and other malign narratives in the information space. Second, I outline experiments that show that people are sensitive to narrative structures that have been proposed in the literature, and that prior theories do provide a basis on which to construct valid experiments. Third, I demonstrate Analogical Story Merging, a new approach to dissecting narratives into their constituent plot components. Finally, I chart several new directions for future work, including ongoing projects in my laboratory, seeking to advance the state of the art in narrative analysis.

Bio: Dr. Mark Finlayson is Assistant Professor of Computer Science in the School of Computing and Information Sciences at Florida International University. He received his Ph.D. from MIT in 2011, and from 2012-2014 was a Research Scientist in MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL). He also worked as a graduate intern at MIT Lincoln Laboratory Group 82 in 1999. His research focuses on representing, extracting, and using higher-order semantic patterns in natural language, especially focusing on narrative. His work intersects artificial intelligence, computational linguistics, and cognitive science. He is general chair of the Computational Models of Narrative Workshop Series.

August 15, 2017- Dr. Kwan-Liu Ma: Visualization for Scientific Discovery and Storytelling

ABSTRACT: Advanced computing and imaging/sensing technologies enable scientists to study complex phenomena at unprecedented precision, resulting in an explosive growth of data. The size of the collected information about the Internet and mobile device users is expected to be even greater, a daunting challenge we must address in order to make sense and maximize utilization of all the available information. Visualization transforms large quantities of, often multiple-dimensional, data into graphical representations that exploit the high-bandwidth channel of the human visual system, leveraging the brain's remarkable ability to detect patterns and draw inferences. It has thus become an indispensable tool in many areas of study involving large, complex data. I will present several visualization designs that my group has produced as either exploratory or explanatory tools for large data found in real-world applications including those relevant to DOE's missions.

Biography: Kwan-Liu Ma is a professor of computer science and the chair of the Graduate Group in Computer Science (GGCS) at the University of California-Davis, where he directs VIDI Labs and UC Davis Center of Excellence for Visualization. His research spans the fields of visualization, computer graphics, high-performance computing, and user interface design. Professor Ma received his PhD in computer science from the University of Utah in 1993. During 1993-1999, he was with ICASE/NASA Langley Research Center as a research scientist. He joined UC Davis in 1999. Professor Ma is presently leading a team of over 20 researchers pursuing research in scientific visualization, information visualization, and visual analytics. He received the NSF Presidential Early-Career Research Award (PECASE) in 2000, was elected an IEEE Fellow in 2012, and received the 2013 IEEE VGTC Visualization Technical Achievement Award for his outstanding research work. Professor Ma has played major roles in several DOE SciDAC projects including leading a SciDAC Institute project on visualization. Today, he still maintains active collaborations with DOE scientists.

July 26, 2017- Dr. Max Grossman: Spark on GPUs: Data Center Runtimes on HPC Hardware

ABSTRACT: There's been much discussion of the convergence of the data center and the supercomputer. As data center workloads have become more memory and compute intensive, members of that community have turned to techniques in HPC for accelerating their applications. Inversely, as long-term maintainability, programmability, and platform flexibility become ever-increasing concerns in the HPC field, we have started to look to techniques from the data centers for enabling continued application development. This talk will focus on a specific instantiation of this convergence: enabling Apache Spark clusters to offload user-written computation to GPU accelerators. As Spark-on-GPUs can mean different things to different users, this talk will include a semi-comprehensive survey of active work being done in this area, discussing the use cases that each approach does or doesn't address. Extra time will be devoted to a deep dive on past and ongoing work at Rice University in the SWAT (Spark With Accelerated Tasks) project.

Bio: Dr. Max Grossman is currently a postdoctoral researcher in the Habanero research group at Rice University, working under Professor Vivek Sarkar. His research focuses on taking an application-driven approach to developing new runtimes, programming models, and tools for heterogeneous and distributed platforms. Max is a co-author of the textbook "Professional CUDA C Programming" and is a co-founder and lead for the HPC and analytics consultancy, 7pod Technologies. Max earned his PhD from Rice University in 2017.

July 25, 2017- Michael Jantz: Collaborative, Tier-Conscious Data Management for Next-Generation Memory Systems

A number of promising new memory technologies, such as non-volatile, storage-class memories and high-bandwidth, on-chip RAMs, are beginning to emerge. Since each of these new technologies present tradeoffs distinct from conventional DRAMs, many next-generation systems will include multiple tiers of memory storage, each with their own type of devices. To efficiently utilize the available hardware, such systems will need to alter their data management strategies to consider the performance and capabilities provided by each tier.

This talk will present ongoing work to improve compute performance and efficiency by increasing the effectiveness of application data management for heterogeneous memory systems. A key realization behind our approach is that the distribution and usage of memory resources depend upon activities that occur in different layers of the vertical execution stack, including the applications, OS, and hardware. Our work aims to increase coordination among these cross-layer activities in order to address the limitations and inefficiencies of existing solutions. We will describe tools, techniques, and frameworks that we have developed to enable applications to adapt to heterogeneous memory hardware transparently and automatically. We will also show preliminary evaluation, conducted in simulation, that demonstrates that our guidance-based approach outperforms, and can even improve, other state-of-the-art management strategies.

Michael Jantz is an Assistant Professor at the University of Tennessee where he directs the CORSYS laboratory. His research aims to develop compiler and runtime system tools and frameworks to increase application performance and efficiency on modern and next-generation computer hardware.

July 24, 2017- Sridutt Bhalachandra: Using Runtime Energy Optimizations to Improve Energy Efficiency in High Performance Computing

Energy efficiency in high performance computing (HPC) will be critical to limit operating costs and carbon footprints in future supercomputing centers. In the push to achieve Exascale performance, a commensurate increase in power is no longer feasible. With both hardware and software factors affecting energy usage there exists a need for dynamic power regulation to achieve savings in energy.

We identify two opportunities to improve energy efficiency - computational workload imbalance and waiting for a resource, mostly memory. In modern HPC systems, power and thermal constraints affect each chip differently causing on-chip mechanisms that control operating frequency to also vary. The performance will thus vary between cores for even perfectly balanced parallel applications. Memory operations in HPC applications are seldom explicit, making it difficult for the operating system to stall (or switch off) cores and reduce power while waiting on memory. The CPU remains active wasting energy. We also investigate the effect of enforcement of power limits by external agents on application performance.

My thesis differentiates itself from prior work by employing adaptive methods at runtime, and power control levers in the processor that have not been readily applied to the above two scenarios. This dissertation highlights an adaptive runtime framework that can allow processors capable of per-core specific power control to reduce power with little performance impact by dynamically adapting to workload characteristics. Different core-specific power controls can be either employed separately or combined to enhance the effectiveness of the framework. Monitoring of performance and power regulation is performed transparently within the MPI runtime system, so no code changes are required in the underlying application. In presence of workload imbalance, the runtime reduces the frequency on cores not on the critical path thereby reducing power without deteriorating performance. The lowering of frequency on the non-critical cores is shown to reduce run-to-run performance variation and improve performance both on conventional and power-limited systems in certain scenarios. For applications plagued by memory related issues, we identify new memory metrics that facilitate lowering of power without adversely impacting performance.

Sridutt Bhalachandra is a Ph.D. student in the Computer Science department at UNC-Chapel Hill and a Research Assistant at Renaissance Computing Institute (RENCI). His research area is High Performance Computing (HPC) with a focus on energy efficiency; also he has spent some time looking at performance variability and reproducibility. His advisors are Dr. Allan Porterfield and Prof. Jan Prins. He is particularly interested in designing adaptive runtime energy optimization methods that do not degrade performance. He has interned at Sandia National Laboratories (Albuquerque) and Lawrence Livermore National Laboratory, and collaborated with the EEHPC Working Group. In future work, he is interested in leveraging his understanding of runtime systems and processor architectures to develop portable solutions that improve the efficiency of HPC systems. Previously, he has worked as a Systems Engineer at Infosys Labs, Bangalore. Sridutt has a Masters in Computer Science from UNC Chapel Hill and Bachelors in Computer Science Engineering from SDM College of Engineering & Technology, Dharwad under Visveswaraya Technological University, Belgaum. You can reach Sridutt by email at sriduttb@{,} or visit his website at

July 20, 2017- Jay Jay Billings: Straightforward Workflows with the Eclipse Integrated Computational Environment

Workflow management systems are tricky things. Many promise great gains in efficiency and automation, but come with a price tag of your two best programmers for the rest of their careers. Others promise the same, but are closed source and not nearly as feature rich as they seemed in the sales pitch, which means using them burns through your funding like its going out of style. This talk presents something different: the Eclipse Integrated Computational Environment (ICE) is a feature rich, open source workflow management system that can be easily and quickly extended by everyone from students to senior scientists. ICE accomplishes this by exposing a workflow model that focuses not on encoding workflows into abstract graphs, but facilitating sets of activities that are common across all modeling and simulation projects to enable developers to quickly deploy new workflows and tools. These activities include input and model development, local and remote job launch with monitoring and control, visualization and data analysis, and local and remote data management. In addition to describing ICE's design and application programming interface, this talk will also provide real-world examples of how ICE is used in a number of different projects, including for modeling and analysis of neutrons on the ICEMAN project, and how users can get started with ICE for their projects too.

July 14, 2017- Gokcen Kestor: Towards Effective Fault Tolerance Solutions for Exascale Systems

Increase in complexity and component count, together with the strict operational power budget, will result in high performance computers being more susceptible to errors. In such faulty environments, applications will need to treat errors as common events that must be handled efficiently, rather than as rare events that can be ignored or handled using expensive schemes. Traditional coordinated solutions might not scale up to the desired level. Instead, localized approaches have the potential of scaling and introduce low overheads.

In this talk, I will first present the design of a fault tolerance mechanism for nested fork-join programs. The work performed by a nested fork-join program can be expressed as a computation tree. This tree structure motivates the localization of a fault to the affected subtree. Our localized fault recovery mechanism isolates the impact of failed nodes in the computation tree to the subtrees rooted at those nodes, recreates only the lost work, and re-executes them. This involves recreating a connected computation tree and discriminating subtrees lost, being executed, or yet to be executed. Our solution shows low overheads in the absence of failures, recovery overheads on the same order as the lost work, and much lower recovery costs than alternative strategies.

Then, I will present our fault propagation framework which analyzes how faults propagate in MPI applications and gives insight into HPC applications vulnerability. Our framework consists of an LLVM-based instrumentation component and a runtime checker that tracks the propagation of faults into the application's state. In this study we show that, without a comprehensive fault propagation analysis, the conclusions driven from statistical output variation analysis may be inaccurate.

Dr. Gokcen Kestor is a research scientist at the Pacific Northwest National Laboratory (PNNL) in the High-Performance Computing Group. Gokcen earned her Ph.D. in Computer Science from the Polytechnic University of Catalonia (UPC) in 2013, Barcelona, Spain and she joined PNNL in July 2012 as a Post Master. Her dissertation investigated effective software transactional memory solutions. Her research interests include parallel programming models and runtimes, especially task based programming models, compilers, resilience for future large scale systems, power and performance analysis and modeling of HPC applications and emerging technologies, investigations into effective use of emerging memory technologies and machine learning techniques in the context of HPC.

She is currently working on fault tolerance solutions for distributed task-programming models, configurable soft error detection techniques, and evaluation of emerging memory technologies.

July 11, 2017- Roberto Gioiosa: Bridging the Gap between HPC and Data Analytics: a System Software View

To achieve the desired level of performance and efficiency, exascale systems will employ novel hardware and software technologies. Some of these technologies will have a revolutionary impact on the way high-performance computing (HPC) applications are designed and executed. Moreover, exascale systems will have to accommodate the execution of emerging data analytics workloads, which have very different characteristics than traditional HPC workloads. The convergence between HPC and data analytics workloads is a key challenge for next-generation systems.

In the first part of this talk, we will look at one of such novel high-performance network technologies, the Data Vortex interconnection network. I will present the Data Vortex architecture and its main capabilities and applicability to traditional and emerging workloads. We will review which programming paradigms are effective on this network and how they affect programming parallel applications. Next, I will present an novel, asynchronous, distributed PGAS programming model that leverages the unique capabilities of Data Vortex interconnection network. The runtime is based on active messages and fully overlap computation and communication, as well as parallel network transactions.

In the second part of the talk, I will summarize the outcome of the DOE Argo project, with particular emphasis on scheduling of HPC simulation and in-situ data analytics. I will present a new OS tasks scheduler that effectively and transparently runs HPC applications and data analytics in the same compute node and dynamically allocates hardware resources on-demand. The OS scheduler greatly improves workflow performance and guarantees performance isolation for the HPC applications.

Dr. Roberto Gioiosa is a senior research scientist at the Pacific Northwest National Laboratory (PNNL) in the High-Performance Computing Group. His research interests in include operating systems and runtimes, high-performance computer architectures, memory, and networks, parallel and distributed programming models, resilience, performance and power modeling and analysis, and embedded systems.

Roberto earned his Ph.D. in 2006 from the University of Rome "Tor Vergara", Rome, Italy. Prior to join PNNL in 2012, Dr. Gioiosa has worked at the Los Alamos National Laboratory (LANL) (2004-2005), the Barcelona Supercomputing Center (BSC) (2006-2008 and 2009-2012), and the IBM T.J. Watson Research Center (2008-2009), where he contributed to the development of the Compute Node Kernel for BG/Q systems.

Currently, his projects include the development of system software for Data Vortex network-based systems, evaluation of emerging architecture and technologies for exascale systems and applications, and development of operating systems for exascale systems.

He is a member of the ACM and IEEE Computer Society.

July 6, 2017- Guoqiang Deng: Preparing Applications for the Emerging Hybrid-Memory Supercomputers

Hybrid memory systems are a promising approach to tackle the scalability challenge in DRAM-based memory systems. In fact, all the three DOE pre-exascale supercomputers will be equipped with multiple memory technologies. State-of-art processors and accelerators are already using memory systems composed of on-chip high-bandwidth memory and off-chip DRAM. Thus, hybrid memory systems will likely become ubiquitous on next-generation supercomputers. Currently, it is still unclear how HPC scientific and data-analytics applications can benefit from this increased memory hierarchy. The memory technologies in a hybrid memory system usually differ in latency, bandwidth, capacity, and volatility. It is a complex task for an application to take advantage of all memory technologies working side-by-side. The objective of our research is to address this challenge by understanding whether current applications can directly benefit from the emerging memory systems, what factors are critical for the application performance, and how to improve the application performance on such systems.

In this talk, I will present the methodologies and findings of our research on hybrid memory systems in three steps. First, we emulate a large-scale supercomputer configured with fast and slow memories. We evaluate the impact of the hybrid memory system to the scalability of scientific and data-analytic applications. We identify three metrics that characterize the application performance on hybrid memory systems. Second, we evaluate the HBM-DRAM memory system on the Intel Knights Landing processor. We identify three key factors that are important for applications to exploit such hybrid memory systems for high performance. Third, we derive a set of rules to guide the data placement in an application. We integrate the rules and a global allocation algorithm into a tool, called RTHMS. The tool provides programmers with recommendations on object-to-memory mapping in their applications. We evaluate the tool using a set of scientific and data-analytic applications varying problem sizes. In the end of this talk, I will briefly introduce my current work on a runtime for dynamic data placement and migration.

Ivy Peng is a Ph.D. candidate in the Department of Computational Science and Technology at KTH Royal Institute of Technology. She started working on heterogeneous memory systems at Pacific Northwest National Laboratory in 2016. Her research interest revolves around the performance of applications on large-scale parallel systems. She believes that preparing HPC applications for the future architecture and technologies relies on understanding the characteristics of realistic applications. Her works include programming models, runtimes, and simulations on HPC platforms.

June 28, 2017- Guoqiang Deng: Developing Open Source in Service to National Security

Ever wondered what software gets developed at a national laboratory? Thanks to the open source efforts of LLNL, you don't have to wonder any longer!

This talk will highlight the ongoing expansion of the open source community inside one Department of Energy research lab. These developers bring software and tools which support science and security, issues of national importance, into the open source community. The talk will also describe the unique challenge of bringing decades old software projects and historically closed source developers into the open source community for the first time.

June 22, 2017- Ian Lee: Mixed Finite Element Methods for Size-dependent Skew-symmetric Couple-stress Mechanics

With the continuous push toward micro and nano scale material and component development, the need for understanding the behaviors of elastostatic and elastodynamic response in small scale becomes more and more important. In this presentation, a brief introduction of consistent size-dependent couple stress theory will be presented, followed by two corresponding mixed Lagrangian formalisms for elastostatic and elastodynamic response, respectively. Finally, a number of two-dimensional plane strain static and dynamic problems are investigated under these formulations of couple stress response, where the results are compared to existing methods, when possible.

Guoqiang Deng holds degrees from University of Science and Technology of China and State University of New York at Buffalo. His PhD research is focusing on developing various types of mixed finite element formalisms for the consistent size-dependent couple stress theory and investigating couple stress behaviors, especially in the area of elastodynamics.

June 21, 2017- Carlos Maltzahn:Towards sustainable open-source technology transfer at universities

The Center for Research in Open Source Software (CROSS) at UC Santa Cruz promotes open-source technology transfer by funding research and providing an incubator for projects that offer a plausible path to widely-adopted open-source software. We believe that CROSS is an example of how a research university can implement open source technology transfer that is profitable and sustainable. The Center has been operating since September 2015 and currently funds 6 Ph.D. students and 2 post-doctoral appointments. Startup funding was provided by UC Santa Cruz alumnus Dr. Sage Weil, Ceph Principal Architect at Red Hat. Yearly funding is currently provided by Toshiba, Micron, and Seagate. By steadily increasing membership from industry and government we expect to be fully sustainable by 2020. In this talk I will give an overview of CROSS, including motivations and benefits for students, universities, and members of industry and government.

Dr. Carlos Maltzahn is the founder and director of the UC Santa Cruz Center for Research in Open Source Software (CROSS). Dr. Maltzahn also co-founded the Systems Research Lab, known for its cutting-edge work on big data storage & processing, scalable data management, and distributed system performance management. Carlos joined UC Santa Cruz in 2004, after five years at Netapp working on network-intermediaries and storage systems. In 2005 he co-founded and became a key mentor on Sage Weil's Ceph project. In 2008 Carlos became a member of the computer science faculty at UC Santa Cruz and has since then graduated five Ph.D. students. Carlos graduated with a M.S. and Ph.D. in Computer Science from University of Colorado at Boulder.

May 22, 2017- Will Zeng: Architectures for Hybrid Quantum/Classical Computing

Abstract: The first scalable universal quantum computing devices are now being designed and built in several groups worldwide. As these devices mature, it is important to consider how best to make use of them. This will require new and applied programming models for quantum computing. In particular, promising near-term algorithms for quantum simulation, quantum chemistry, and optimization require a hybrid quantum/classical programming environment. In this talk, we introduce an open-source environment (Forest) based on a shared-memory intermediate representation (Quil). The environment runs through a cloud API with client-side Python libraries that can target both superconducting quantum circuit and classical simulation backends. We discuss the programming model and implementations of near-term algorithms in this environment.

May 8, 2017- Robert Smith: Integrated Scientific Visualization with the Eclipse Advanced Visualization Project

Abstract: Visualization is an important part of many scientific workflows, both pre- and post-processing. The Eclipse Advanced Visualization Project (EAVP) is an open source Java project which offers a unified service based architecture for visualization, with a focus on use in Eclipse workbenches. EAVP provides a variety of capabilities to cover cases for 2D and 3D data, numerous data file types, integration with multiple third party visualization programs, remote and local execution, and both pre-processing editing of a simulation input file and post-processing viewing of a result. These services include .CSV graph plotting, editing of meshes and geometries using JavaFX, and launching and connecting to state of the art programs like ParaView and VisIt. This talk will cover the architecture and strategy used in EAVP, how it can be used to provide visualizations, and future plans for expanding the project to work in new environments outside of Eclipse RCP apps, such as Java web frameworks and other native Java windowing toolkits such as JavaFX and Swing.

Robert Smith is a post-masters research associate at Oak Ridge National Laboratory on the Scientific Software Development Team. His research interests include visualization, workflow management, and machine learning. He received his MS in computer science from Wake Forest University.

May 1, 2017- Ada Gavrilovska: Memory Fabric: Seamless Scaling Across Complex Memory Topologies

Abstract: Emerging memory technologies – from fast, but small-capacity High Bandwidth Memories (HBMs), to much slower, larger, and persistent non-volatile memories (NVMs) – are transforming the way systems are being built. Future leadership systems, from CORAL to Exascale and beyond, and the codes that will run on them, will need to be designed for a much richer and more complex environment. Using data-intensive scientific and analytics applications as motivation, our work addresses the complex memory fabrics of heterogeneous memory components, with different, and potentially configurable, capacity, consistency, sharing, and access properties. In this talk, I will present our research on re-architecting the systems software toward harnessing greater benefits from heterogeneous memory systems. I will present an overview of the pVM and HeteroOS systems, which provide 3x performance improvements for systems with non-volatile or heterogeneous memories, without requiring any application modifications. I will then describe follow-on work on accelerating persistent-memory-based support for checkpoint and transport services for HPC workflows, supported by the Department of Energy. I will conclude with illustration of our current activities toward furthering the gains in effective performance and efficiency of the heterogeneous memory fabric through co-design across the entire stack, from application programming interfaces and runtimes, to intelligent resource management, and hardware-assisted acceleration.

Bio: Ada Gavrilovska is an associate professor in the School of Computer Science at Georgia Tech, where she leads the KERNEL research group. Her research is largely driven by emerging hardware technologies and modern workloads, and focuses on addressing performance, scalability and efficiency problems across the systems software stack. Recent projects include operating system and hypervisor methods for dealing with platform-wide compute and memory heterogeneity, dynamic resource management for large-scale multicores and server systems with high-performance fabrics, and systems support for tapping into the increased client-side resource diversity. Gavrilovska's research has been supported by the National Science Foundation, the US Department of Energy, and industry grants from Cisco, HP, IBM, Intel, Intercontinental Exchange, LexisNexis, VMware, and others. She has published ninety peer-reviewed papers, and edited a book "High Performance Communications: A Vertical Approach".

April 19, 2017- Ismail Akturk: Energy Efficiency Challenge in Computing: Today and the Future

Abstract: A critical challenge for modern system design is accommodating the increasing demand for performance in a tight power budget. To address this challenge, it is essential to understand theoretical and practical limits of computation. In this talk, we will discuss about current practices, how they evolved over the years and why energy efficiency became the utmost importance for computing systems. We will cover a subset of promising ways to improve energy efficiency, including data recomputation and exploiting approximate computing. Oftentimes, recomputing data becomes more energy efficient than storing and retrieving pre-computed data by minimizing the prevalent power and performance overhead of data storage, retrieval, and communication. The key idea behind data recomputation is to replace a load with a sequence of instructions to recompute the respective data value, only if it is more energy-efficient. On the other hand, approximate computing has emerged as a promising paradigm that consists of various techniques spanning multiple levels of the system stack which exploit algorithmic noise tolerance of emerging Recognition-Mining-Synthesis (RMS) applications to improve energy efficiency and performance. In this regard, we will discuss about approximate near-threshold voltage computing to improve energy efficiency. Finally, we will touch upon emerging technologies and novel computing paradigms that will change our way of designing systems and way of computing which I call Renaissance of Computing.

Ismail Akturk is a Ph.D. candidate in the Department of Electrical and Computer Engineering at University of Minnesota, Twin Cities. He is a member of ALTAI research group, led by Prof. Karpuzcu. He holds an MS in Electrical Engineering and an MS in Computer Engineering. He is broadly interested in computer systems, and specifically in computer architecture. His research efforts include improving energy efficiency, scalability and reliability of the systems and his work has been published in top-notch venues, such as IEEE Micro Magazine, ACM TACO, HPCA and ASPLOS. Before joining the University of Minnesota, he worked in the National Center for High Performance Computing of Turkey as data management specialist.

April 11, 2017- Dr. Bronson Messer: ORNL MiniApps Webinar Series: Ziz - An Extensible MiniApp for Astrophysical Multiphysics

Abstract: Ziz is a MiniApp that models the performance of the domain-decomposed, directionally split hydrodynamics framework of the Chimera core-collapse supernova code. It is designed to be a small sandbox application to explore the impact of various local physics modules and particular implementations in the context of a multiphysics code like Chimera. I will describe the structure of Ziz and what behaviors it is trying to capture, along with build and execution options and peformance characteristics of the code.

April 10, 2017- Dr. Jee Choi: High-Performance Tensor Decomposition for Data Analytics and Co-Design

Abstract: Many social and scientific domains give rise to data with multi-way relationships that can naturally be represented by tensors, or multi-dimensional arrays. Decomposing – or factoring – tensors can reveal latent properties that are otherwise difficult to see. However, due to the relatively recent rise in popularity of tensor decomposition in high-performance computing (HPC), its challenges in performance optimization are poorly understood.

In this presentation, I will explain the steps taken to identify and isolate the major bottlenecks in tensor decomposition algorithms, and demonstrate significant speedup over prior state-of-the-art using various cache blocking mechanisms. I plan also to show our first-cut attempt at creating a performance model for tensor decomposition that will pave the way for future work in creating a composable auto-tuning framework for 1.) developing faster and more versatile tensor decomposition libraries; and 2.) algorithm-architecture co-design for faster data analytics hardware.


Jee Choi is a postdoctoral researcher at IBM - T. J. Watson Research Center located in Yorktown Heights, New York. He received his Ph.D. in Electrical and Computer Engineering at Georgia Institute of Technology (Georgia Tech) where he worked on all things HPC.

His work on auto-tuning sparse matrix-vector multiply for graphics processing units is one of the most cited papers in its area, and his Ph.D. dissertation on energy and power modeling for HPC applications was one of the first to directly connect algorithmic properties to architectural parameters for energy and power.

His latest endeavor is optimizing tensor decomposition algorithms for data analytics, which is part of a larger co-design project at IBM to design the next-generation data processing system.

April 6, 2017- Professor Pino Martin: Exascale Computing and Big Data for Hypersonics in Support of Basic Science and National Security

Abstract: The peak performance of the most advanced computing systems today is at about 20 petaflops. In seven years, the DoE Exascale Computing Project promises a factor of 50 increase in peak performance over the current systems. At present, low-dissipation schemes, implicit time integration techniques and increased computing power have made possible for the role of high-fidelity computations in hypersonic flows to expand from basic science into Technology-Ready-Level 3 capabilities (in DoD standards) that are being used in Test and Evaluation of hypersonic systems and being adopted by the aerospace industry for new vehicle designs. Our current simulation capabilities include physics-based validated models for high-temperature physics, combustion, surface/fluid interactions, fluid/structure interactions, turbulence transition, coupling with turbulence, and wall-turbulence modeling. Sophisticated grid generation and adaptive mesh refinement techniques show scaling up to one billion unstructured-grid elements (which are 1,000 times more difficult than i-j-k-ordered cubical grids) in over 1,000 compute nodes (at 24 compute cores per node), corresponding to the largest capability available to us on DoD machines. With exascale computing, we will perform high-fidelity full vehicle simulations in useful compute time, discover new flow physics and robustly optimize flight systems. In this talk, I will present the simulation capability, sample data from hypersonic flow simulations, impact of basic science results, scalability of simulation capability with adaptive mesh refinement techniques on unstructured grids, and a vision for large-mesh-size problems (1T unstructured grid elements), big-data analyses and exascale computing. We expect that investments are necessary in developing new techniques to scale simulations beyond 10B unstructured grid elements, and to deploy infrastructures that allow effective interactions with Big Data resulting from present and envisioned simulations.

Bio: Prof. Pino Martin is an Associate Professor in the Department of Aerospace Engineering, at the University of Maryland. Previously she was an adjunct professor at Princeton University, receiving her Ph.D. in Aerospace Engineering in 1999. Prof. Martin's research foci include, but are not limited to, computational fluid dynamics, numerical simulation of turbulent flows, and numerical methods for compressible turbulence. Dr. Marin is a member of the American Physical Society (APS) and a fellow member of AIAA. More information about Dr. Pino Martin's research and the CRoCCo Laboratory can be found on the web:

March 20, 2017- Wayne Joubert: ORNL MiniApps Webinar Series: MiniSweep - A Proxy Application for Sn Radiation Transport Calculations

Abstract: Minisweep is a miniapp that models the performance of the sweep operation of the Denovo radiation transport code, part of the Exnihilo code suite used by CASL and current INCITE application. It is used for performance evaluation and porting of the Sn transport sweep algorithm to new computer architectures, currently supporting OpenMP, CUDA for GPUs and compiler directives for Intel Xeon Phi. In this talk we describe the structure of the underlying algorithm, techniques used to map the algorithm to advanced architectures, build and execution options and peformance characteristics of the code.

March 10, 2017- Dr. Swaroop Pophale: Designing next-generation programming models using lessons learned from PGAS

Abstract: With the coming of the exascale era and rapidly improving hardware, applications need well designed programming models that are simple to use but that capture key performance characteristics of next-generation systems. A programming model with the right abstractions also helps advance performance portability. Many lessons can be learned from Partitioned Global Address Space (PGAS) models that have explicit concepts to express data locality and control the affinity of the computations. As we move forward more PGAS concepts are making its way to mainstream programming models like MPI, OpenMP / OpenACC, etc. and the key challenges are to find right abstractions without increasing their complexity. This talk focuses on lessons learnt from PGAS and the new features being incorporated into mainstream models (OpenMP and OpenSHMEM) to tackle the problems of data locality, thread affinity and massive amounts of thread-level parallelism to improve the utilizations of memory, heterogeneous cores, and network interconnects.

Bio: Dr. Swaroop Pophale got her doctorate in Computer Science in 2014 from University of Houston where she worked with Dr. Barbara Chapman on Partitioned Global Address Space programming model with special emphasis on compiler based static analysis tool for OpenSHMEM. She is currently a postdoctoral research associate at ORNL in the Computer Science and Mathematics division. Her current responsibilities at ORNL include programming models research for new architectures and developing new features that aid DOE and DOD applications. She has been involved in the OpenSHMEM standardization effort since its inception and continues to contribute towards the Specification and extensions' research. Her other research interests include applications and benchmarking. She is a candidate for a staff position in the Computer Science Research Group.

March 7, 2017- Dr. Jean-Luc Fattebert: Scientific computing adventures in materials science

Abstract: Computer simulations are widely used by physicists, chemists, biologists and materials scientists to better understand the behavior and properties of matter, from the atomistic to the macroscopic levels. Dr. Fattebet will talk about a few numerical methods that he worked on over the years to speed up these simulations and improve their fidelity. From First-Principles Molecular Dynamics to Phase-Field models, Dr. Fattebert will describe how some advanced numerical algorithms can help application scientists efficiently use our most powerful High Performance Computing resources. While some challenges are inherent to the physical models and their specific equations, the different point of views, languages (in software and papers!), and jargons used by various scientific communities, they are not to be underestimated when it comes to make good use of modern applied mathematics in real applications. It is often what makes the difference between a successful project with useful outputs, and an academic exercise with little use in real applications.

Bio: Dr. Jean-Luc Fattebert received his Master's degree in Physics from the Swiss Federal Institute of Technology in Lausanne, Switzerland, in 1992. He then obtained his Ph.D. in Applied Mathematics, also from the Swiss Federal Institute of Technology, in 1997, under the supervision of Prof. Jean Descloux. He then spent two years as a postdoctoral researcher at the Physics department of North Carolina State University, Raleigh, NC, under the supervision of Prof. Jerry Bernholc. In 1999, he joined the Lawrence Livermore National Laboratory (LLNL) as a postdoc, before becoming a staff member a year and a half later. He has remained a researcher at the Center for Applied Scientific Computing at LLNL ever since, working in various areas ranging from Density Functional Theory models and solvers, to molecular dynamics, load balancing, phase-field models and solvers, adaptive mesh refinement, iterative linear solvers, and human heart simulations. Over the years, Dr. Fattebert has led 3 Laboratory Directed Research and Development projects at LLNL. He is also a three times Gordon Bell prize finalist, the last time in 2016 as project leader.

February 23, 2017- Chongxiao "Shawn" Cao: Utilization and Extension of Task-based Runtime for High Performance Dense Linear Algebra Applications

Abstract: On the road to Exascale computing, dynamic task-based runtimes can alleviate the disparity between hardware peak performance and application performance, by providing executions that unfold only based on the dataflow between tasks. In this presentation, I would like to introduce two parts of my PhD work related to task-based runtime. The first part is the design of a unified framework to run high-performance dense linear algebra applications for platforms equipped with multi-GPUs and multi-Xeon Phi coprocessors. A lightweight task-based runtime is utilized to manage the resource-specific workload, and to control the dataflow and parallel execution in hybrid system. The second part of this presentation is to introduce the fault tolerant design for a task-based runtime. Three additions to a dynamic task-based runtime have been explored to build a generic framework providing soft error resilience, including sub-DAG method, data logging method and algorithm-based fault tolerant method. We also take one step further to improve the general data logging method to a remote version to provide resilience for hard error.

Chongxiao "Shawn" Cao is currently a Ph.D. candidate in Computer Science at the University of Tennessee, Knoxville. Chongxiao started in the Ph.D. program in August, 2011. He is working as a Research Assistant in the Innovative Computing Laboratory (ICL) under guidance of Dr. Jack Dongarra and Dr. George Bosilca. His research interests include fault tolerance in parallel computing, dynamic task-based runtimes and high performance Linear Algebra routines for distributed heterogeneous architectures.

February 9, 2017- Mark Berrill: ORNL MiniApps BlueJeans Seminar Series: The XRayTrace MiniApp

Abstract: This talk will cover the XRayTrace miniapp. XRayTrace is a program designed to simulate x-ray lasers that occur in laser-created plasmas. It solves the couples ray equations, intensity equations, and atomic physics to accurately predict the resulting x-ray laser properties including energy, near-field and far-field beam patterns, and the frequency and temporal shapes. The miniapp is designed to test the ray-trace behavior of the application under different architectures and programming models. The talk will cover a brief introduction into the design of X-ray lasers, the XRayTrace program, and finally the miniapp design. The design goals and implementation of the miniapp as well as a review of the existing parallel programming models implemented will be covered.

February 7, 2017- Bruno Turcksin: Matrix-free operator evaluation for additive manufacturing simulation

Abstract: There has been a long trend, in computer architecture, where the number of flops increases much faster than the amount and the bandwidth of memory. Today, even storing the matrix associated with the system can require a lot of the available memory. In many modern simulations, the use of matrices is only through sparse matrix-vector multiplications (SpMV) within Krylov solvers. However, SpMV is heavily bandwidth limited, both on CPU and GPU systems. GPUs have a much higher memory bandwidth than CPUs, and GPUs also have greater arithmetic capabilities. Moreover, GPUs have much less memory per core than CPUs. These factors favor stopping the separation of linear algebra from finite elements assembly routines and, instead, computing the action of the operator on the vector on the fly.

In this talk, Dr. Turcksin will discuss the implementation of the matrix-free operator evaluation in the context of additive manufacturing (AM) simulation. In particular, he will focus on the fast and accurate simulation of the heat transfer and the phase changes between the powder, the liquid, and the solid states of a given material. Phase changes and temperature variations are fundamental to predict the mechanical properties of materials. Therefore, these simulations are a necessary first step to predict the behavior of objects built using AM.

Dr. Turcksin will also briefly talk about adaptive mesh refinement strategies for AM simulation. Because the heat sources used in AM are very localized, it is important to use a mesh tailored to the problem solved. However, due to the time dependent nature of the problem, it is not possible to construct a single mesh that can be used throughout the whole simulation and therefore, the mesh needs to be adapted automatically.

BIOGRAPHICAL INFORMATION: Dr. Turcksin earned a Ph.D. in Nuclear Engineering from Texas A&M University in 2012. From January, 2013 to December, 2015, he was a visiting assistant professor in the department of Mathematics at Texas A&M, working on the deal.II finite element library. Since January, 2016, he has been a postdoctoral research associate in the Computational Engineering and Energy Sciences group where his main focus has been on the simulation of energy storage devices. His primary areas of expertise include numerical methods for neutron and electron transport, finite element, adaptive mesh refinement, and high performance computing.

January 27, 2017- George J. Nelson: Multiscale Transport in Energy Conversion and Storage Devices

Abstract: Contemporary energy storage and conversion devices are inherently multiscale, functional material systems. The operation of electrochemical energy storage and conversion devices relies upon composite material systems that provide both sites for chemical reactions and pathways for charge and reactant transport. Forging a stronger understanding of the multiscale interaction between transport phenomena, device and mesoscale geometry, and device performance and reliability can further advance the development of next generation storage and conversion technologies. To this end, x-ray and neutron imaging methods provide unique capabilities for direct observation of multiscale geometry, physics-based simulation using real device structures, and direct observation of material changes during operation. Three cases are addressed to illustrate these capabilities: x-ray tomography of Li-ion battery electrodes, mesoscale modeling of battery materials based on tomographic data, and in situ neutron imaging of enzymatic batteries. First, 3D x-ray nanotomography (XNT) and microtomography (µCT) of Li-ion battery electrode materials are presented for high capacity alloy anode materials and transition metal oxide cathode materials. Methods for elemental and chemical mapping within these systems are demonstrated. Absorption contrast imaging is presented as a means of structural and elemental mapping within battery anodes and cathodes. X-ray absorption near edge structure (XANES) nanotomography is then presented as a means of tracking chemical variations related to charge and discharge. Following demonstration of current XNT and µCT work, mesoscale modeling of Li-intercalation is presented using Li-ion cathode XNT data as a computational domain. Observations from this model are complemented by microstructural analysis that permits description of cathode charge and discharge behavior in terms of dimensionless parameters. Finally, in situ and in operando imaging efforts are discussed in the context of neutron imaging performed on enzymatic batteries. The relevance of these methods to other electrochemical energy conversion and storage devices is discussed.

BIOGRAPHICAL INFORMATION: George J. Nelson received his Ph.D. in Mechanical Engineering from the Georgia Institute of Technology (2009). Prior to his appointment at UAH he was an Assistant Research Professor at the University of Connecticut (2009-2012), where he performed research in electrochemical energy conversion and storage and contributed to the development of novel 3D imaging and analysis methods for energy materials based on x-ray nanotomography. Prof. Nelson specializes in multiscale modeling of transport in energy conversion and storage devices and 3D microstructural imaging techniques. He is active in the ASME Advanced Energy Systems Division through the Electrochemical Energy Conversion and Storage Technical Committee and the AESD Executive Committee. Prof. Nelson is a recipient of an Oak Ridge Associated Universities Ralph E. Powe Junior Faculty Enhancement Award (2013) and a National Science Foundation CAREER Award (2015).

January 24, 2017 - Eric Lingerfelt: BEAM: An HPC Pipeline for Nanoscale Materials Analysis and Neutron Data Modeling

Abstract: The Bellerophon Environment for Analysis of Materials (BEAM) enables scientists at ORNL's Center for Nanophase Materials Sciences and Spallation Neutron Source to leverage the integrated computational and analytical power of ORNL's Compute And Data Environment for Science (CADES) and the Oak Ridge Leadership Computing Facility to perform near real-time scalable analysis and modeling. At the core of this computational workflow system is a web and data server located at CADES that enables multiple, concurrent users to securely upload and manage data, execute materials science analysis and modeling workflows, and interactively explore results through custom visualization services. BEAM's long-term data management capabilities utilize CADES' petabyte-scale file system and enable users to easily manipulate remote directories and uploaded data in their private data storage area as if they were browsing on a local workstation. In addition, the framework facilitates user workflow needs by enabling integration of advanced data analysis algorithms and authenticated, "push-button" execution of dynamically generated workflows employing these algorithms on Titan, Eos, and Rhea at OLCF, as well as compute clusters at CADES. We will demonstrate band excitation analysis, principal component analysis, and de-noising of SPM and STEM data using a variety of HPC implementations including FORTRAN, R with pdbR, GPU-accelerated C++, and Java with Apache Spark – all tightly bound with parallel HDF5. In addition, we will discuss initial implementations of near real-time optimization and modeling of inelastic and quasi-elastic neutron scattering data utilizing Titan, CADES, and BEAM.

December 16, 2016 - Harsh Bhatia: Scientific Visualization and Analysis for Data-driven Discovery

Abstract: As the advances in technology allow us to create increasingly more-complex, more-detailed, and larger-scale data, the contemporary challenges in exploration of the resulting data are becoming more important and more difficult. Among other challenges, effective analysis and visualization of scientific data are particularly crucial ingredients for successful discovery in all areas of science and engineering. My research has focused at addressing some of these challenges, as I have developed new fundamental and computational techniques for analysis and visualization for a variety of scientific data. In this talk, I will present my work on the analysis of flow fields - feature extraction using new frames of reference. I will discuss the importance of new frames of reference for flow analysis, and show how these fundamental ideas can be applied to a wide variety of applications, such as combustion, oceanography, and aerodynamics. Additionally, I will also show how visualization principles and topological techniques can be applied to explore large-scale molecular dynamics simulations, and allow extracting new insights about the underlying phenomena.

Harsh Bhatia is a post-doctoral researcher at the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory, where he develops new tools and techniques for the analysis and visualization of scientific datasets. He defended his Ph.D. thesis in Computing from the SCI Institute at the University of Utah in 2014, prior to which, he had received a B.Tech. degree in Information & Communication Technology from DAIICT, India. Harsh is primarily interested in the analysis and visualization of scientific and nonscientific data, uncertainty visualization, computer graphics, and modeling and simulation.

December 15, 2016 - Ranadip Acharya: Prediction of microstructure in laser powder bed fusion process

Abstract: Additive manufacturing (AM) processes are receiving widespread attention due to the ability to create or repair precision engineering components without use of any die or mold. Currently, the approach to obtain a specific user defined/as-desired or conformal/epitaxial microstructure is a challenging and expensive iterative process. Modeling and validation of solidification microstructure can be leveraged to reduce iteration cost in obtaining a desired microstructure. Numerical Volume-of-fluid based method incorporating Marangoni convection can accurately predict the resultant melt pool geometry and temperature distribution which can serve as an input in prediction of microstructure evolution in solidifying mushy region. Hence, in the present study, CFD analysis is used to predict melt pool characteristics and phase field modeling is employed to simulate microstructure evolution in the as-deposited state for laser powder bed fusion (LPBF) process. Different features of LPBF microstructure such as segregation of secondary elements, dendrite sizes, dendritic orientation, dendritic morphology, and surface roughness are investigated and validated through comparison with experimental results. Phase-field model suggests strong dependency of dendrite orientation on surface roughness and scan speed and suggests potential of columnar flip or oriented-to-misoriented transition at higher scan speed. Segregation of the secondary elements is found to be the dominant factor in resultant dendrite width in the range of 1-3 mm. Furthermore, the developed method can easily be extended to predict the change in orientation of dendrites as new layers are built atop previous layers.

Dr. Ancharya received his Ph.D. (Mechanical Engineering), M.S. (Material Science) at Georgia Institute of Technology, his M. Tech (Manufacturing Science and Engineering), at the Indian Institute of Technology Kharagpur, and a B.E (Mechanical Engineering) at Jadavpur University, India. Dr. Acharya completed his Ph.D. on multi-physics modeling of additive manufacturing process intended for epitaxial repair of gas turbine hot section components made of Ni-based superalloys. He worked on different single-crystal, equiaxed and directionally-solidified 'difficult-to-weld' superalloys and formulated empirical relations for microstructural transitions. Prior to joining Georgia Tech, he was with ANSYS and primarily worked on CFD modeling for different engineering processes. His current research interest includes melt pool modeling for dimensional control and phase-field modeling to predict microstructure in additively manufactured components. He is currently involved in HPC4mfg effort on modeling of microstructure evolution with ORNL and LLNL, and pyrolysis modeling of CMCs, funded by AFRL. Dr. Acharya has 12 publications related to processing of different nickel-based superalloys and holds reviewer positions in additive manufacturing and material processing journals. He also serves as adjunct faculty at University of Hartford and University of Bridgeport.

December 2, 2016 - Kevin Lai: SOFC-MP and Recent Developments

Abstract: Dr. Kevin Lai will present the overview of the SOFC-MP software tool package and some of the most recent enhancements. The tool set includes the SOFC-MP 2D module, 3D module, graphical user interface (GUI), Reduced Order Model (ROM), and ROM wrappers that integrate SOFC-MP with ROM. Notable features include the variable cell performance feature that can simulate practical operating conditions. Recent enhancements include extending the simulation domain outside of the stack to include fuel and oxidant recirculation and exhaust heat exchangers. The enhanced features have also been supported in ROM which provides performance metrics that can be better integrated with ASPEN for system modeling purposes. With the combined capabilities, the SOFC-MP tool set is proven to be capable of providing flexible, efficient and reliable simulation capability for SOFC stacks as well as improved understanding of system performance and optimization.

Topics to be included are enhancements for variable pre-reform factor, parametric study on 2D recirculation, performance of pressurized stack using 2D module, parametric study on 3D recirculation, and tapered stack performance using the 3D module.

November 10, 2016 - Hongzhang Shan: Optimizing the nearest-neighbor communication using one-sided communication

Abstract: Nearest-neighbor communication is the dominant communication in many scientific applications. It is often implemented using MPI two-sided communication and communication data between two neighbors and are aggregated for performance concerns. As a result, there is at most one message between any pair of processes between synchronizations. This high-frequent synchronization poses a great challenge for one-sided communication. In this talk, I am going to examine the performance and programming differences between MPI two-sided, MPI one sided, and UPC++ (a PGAS implementation for one-sided communication) for the nearest neighbor communication. The performance results indicate that MPI one-sided can deliver similar performance with MPI two-sided. But UPC++ can perform significantly better by applying features such as message pipelining and active messages.

Hongzhang Shan specializes in parallel programming paradigms, performance tuning, and modeling. He has published over 60 papers in premier international conferences and journals. One of them was given the Gordon Bell Award at SC2008. He also got the Best Paper award at IPDPS2007 and the Best Student Paper award at SC2000.

November 9, 2016 - Karl Fuerlinger: DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms

ABSTRACT: DASH is a realization of the partitioned global address space (PGAS) programming model in the form of a C++ template library. DASH offers distributed datastructures with flexible data distribution schemes and implements a PGAS model by relying on a one-sided communication substrate which is accessed through an intermediate runtime layer called DART (the DASH runtime). DASH can be used within a shared memory node as well as between nodes and it provides an iterator-based interface that is similar to the data containers of the C++ Standard Template Library (STL). To support the development of applications that exploit a hierarchical organization, either on the algorithmic or on the hardware level, DASH features the notion of teams that are arranged in a hierarchy. Based on a team hierarchy, the DASH data structures support locality iterators as a generalization of the conventional local/global distinction found in many PGAS approaches.

Dr. Karl Fuerlinger is a lecturer and senior researcher at the Ludwig-Maximilian-University Munich, working in the area of parallel and high performance computing. His research is focused on program analysis tools and parallel programming systems. Before joining Ludwig-Maximilian-University Munich, Dr. Fuerlinger was a postdoctoral researcher at the University of California at Berkeley affiliated with the NERSC supercomputing center, and prior to that he was a senior research associate at the University of Tennessee at Knoxville.

November 9, 2016 - David Riegner and Wolfgang Windl: Center for Performance and Design of Nuclear Waste Forms and Containers (WastePD EFRC)

DOE-EM is responsible for liquid, glass, ceramic, and metallic nuclear waste that must be safely isolated from the environment for long periods. This requires understanding both the performance and fundamental mechanisms of waste form degradation and the design of new waste forms with improved performance. These tasks comprise the goals of WastePD, a new EFRC at The Ohio State University. Within its project portfolio, WastePD targets designing novel metal alloys with improved corrosion resistance, with focus on multi-component systems, specifically high-entropy alloys (HEAs) and bulk metallic glasses (BMGs), which promise to have outstanding mechanical properties and corrosion resistance. While traditional alloys are primarily based on one or two components, both HEAs and BMGs frequently contain five or more components in comparable concentrations. The high concentrations of many metallic components make it challenging to find alloys that form single-phase HEAs or BMGs that are able to vitrify into amorphous structures. In this talk, we will discuss work and computational challenges centered on identifying sets of elements capable of being combined successfully into new crystalline or amorphous alloys. Our work is partially based on classical-potential molecular dynamics simulations, for which the first challenge is to develop analytical or numeric functions ("empirical potentials") that describe the interatomic energies and forces in a multicomponent system sufficiently well. These functions are then used to study vitrification/crystallization of metallic liquids by molecular dynamics (MD) simulations, which involve lengthy computations for large numbers of atoms. Here, we will describe the challenges and approaches for developing the interatomic potentials, as well as performing and analyzing the MD runs. As an alternative, we will also describe a novel multi-cell Monte Carlo approach that allows studying phase decomposition (i.e. demixing of the alloy) with high computational efficiency that enables the use of high-accuracy quantum-mechanical methods to describe the interatomic forces instead of the empirical potentials.

Wolfgang Windl is a Professor in the Department of Materials Science and Engineering at The Ohio State University and works in the area of Computational Materials Science. Before joining OSU in 2001, he spent four years with Motorola, ending his tenure as Principal Staff Scientist in the Digital DNA Laboratories in Austin, TX, where he was working in the area of multiscale modeling of semiconductor processing. Previously, he held postdoctoral positions at Los Alamos National Laboratory with Art Voter and Arizona State University with Otto Sankey. He received his diploma and doctoral degree in physics from the University Regensburg, advised by Dieter Strauch. Among others, he received the first Fraunhofer-Bessel Research Award from the Humboldt Society in 2006; a 2004 Nanotechnology Industrial Impact Award from the Nano Science and Technology Institute; 1998 and 1999 Patent and Licensing Awards from Los Alamos National Laboratory; three Lumley Research Awards and one Ralph-Boyer Teaching Award from the College of Engineering at The Ohio State University; and 2006 and 2015 Mars Fontana Best Teacher Awards of the Department of Materials Science and Engineering at the Ohio State University.

David Riegner is a Postdoc within WastePD at The Ohio State University and works in the area of Computational Materials Science. He received his PhD in Materials Science from OSU in 2016 and received during his graduate time several awards and recognitions, including a best-poster award at the OSU Hayes Graduate Research Forum and the inaugural Woolley Teaching Fellowship within the MSE Department.

October 28, 2016 - Dr. Nilesh Mahajan: Kanor: A Language for Declarative Communication

Abstract: Writing efficient parallel programs continues to be a challenge for the programming community. Large-scale parallel programs are usually coded using the single program multiple data (SPMD) model in which processes communicate by sending messages using a standardized and portable Message Passing Interface (MPI) library. Writing efficient MPI programs often requires a deep understanding of how a parallel program works and forces programmers to compromise readability by strewing communication primitives all over the unrelated computational code.

This thesis describes a domain-specific language (DSL), called Kanor, that takes a different approach. Kanor allows programmers to specify communication patterns at a high level, in Bulk Synchronous Parallel (BSP) style. The semantics of the language are carefully defined to guarantee correctness properties, such as deadlock freedom and determinism, while allowing efficient execution. The language is highly expressive, able to succinctly describe all the existing MPI collective operations, and allowing users to create their own custom collectives that could be detected and optimized. The BSP style of Kanor also makes it amenable to source-level optimizations that are well understood, including those that exploit shared memory for efficient intra-node communication.

The syntax and semantics of Kanor are described, and correctness of properties is discussed. Next, an implementation of Kanor is presented that is embedded in C++ and provided as a library, along with runtime optimizations. In addition, an optimizing transformation of Kanor programs that tries to overlap communication with computation is presented. A backend for Kanor is presented next. This uses shared memory to reduce buffer copies while communicating between processes on the same node. Discussion of future directions for Kanor concludes this talk, including heterogeneous backends, and the optimization of Kanor programs for irregular domains such as graph algorithms.

October 18, 2016 - Kshitij Mehta: Accelerating Seismic Applications for GPUs using Directive-Based Programming Models

Abstract: The evolution of supercomputers has enabled the seismic imaging industry to develop increasingly accurate and computationally intensive methods in the search for oil and gas. Depth migration algorithms such as One-Way Wave Equation Migration and Reverse Time Migration are commonly used for imaging the earth's interior. In this project, I we will describe our experience with using directive-based programming models such as OpenACC to accelerate seismic applications and conduct large-scale experiments on Titan. Some preliminary results with OpenMP 4 are also discussed. This talk will focus on the programmability and challenges associated with using directive based programming as compared to native language extensions such as CUDA. Finally, we discuss potential improvements in directive-based programming models for developing heterogenous applications for Exascale.

October 14, 2016 - Graham Lopez: Moving Today's HPC Applications to Upcoming Heterogeneous Architectures

Abstract:Although heterogeneity has been a part of the HPC landscape for several years now, upcoming systems continue to increase in complexity in an effort to achieve higher performance and efficiency. As a result, applications have also been evolving. Unfortunately, the complexity of the system software and programming model implementations, as well as the application code itself, tends to correlate with that of the hardware.

In this talk, I will present some of our work that has been focused on managing this complexity as applications move to new HPC architectures. As part of this effort, there has been a focus in the HPC community on the idea of performance portability, where in the best case an application could be implemented as a single code base with a minimum number of "#ifdef's" and still achieve near optimal performance on varying hardware. I will discuss our evaluations of various programming models that are striving to provide performance portability, and I will provide some discussion about their successes, and potential for further success. I would also like to present some of the strategies we are using to motivate the future development of these programming models to further address the needs of HPC applications.

Graham Lopez is a postdoctoral researcher in the Computer Science and Mathematics Division at Oak Ridge National Laboratory where he works on programming environments preparation for the DOE CORAL and Exascale Computing projects. Graham has published research in the areas of computational materials science, application acceleration and benchmarking on heterogeneous systems, low-level communication APIs, and programming models. He earned his M.S. in Computer Science and Ph.D. in Physics from Wake Forest University. Prior to joining ORNL, he was a research scientist at Georgia Institute of Technology where he worked on application and numerical algorithm optimizations for accelerators.

October 3, 2016 - Mr. Jason Wang: Data I/O middleware for isolating development complexities and enabling real-time optimization

Abstract:One of the future world's largest data generators, the Square Kilometre Array telescope (SKA), is currently approaching the construction phase. As a budget-limited project, the efficiency and capacity of the data processing and storage system is limiting its final scope. This brings critical challenges to the data I/O techniques. In particular, a large proportion of current radio astronomy data processing platforms are still using serial I/O, which has already caused serious problems to some latest pre-SKA projects, as the size of data products is dramatically increasing. Radio Astronomy is just one example of a modern science projects where the scientific potential is limited by the processing capabilities and efficiency. Many other science domains are experiencing the same challenges.

This thesis studies these challenges in detail for the radio astronomy domain at various stages of the data processing chain, including the correlation and the imaging steps. The investigations cover the identification of I/O bottlenecks as well as the selection and implementation of potential solutions and their performance comparison. In particular, ADIOS has been adopted for the mainstream data reduction library for Radio Astronomy, Casacore.

As a result of those studies, the generic I/O framework SHORE has been designed and implemented using the specific domain centric experience and implementations from the studies while keeping the focus on the domain independent elements of a middleware system. SHORE addresses two main points, which have not been in the focus of other systems in this domain. One is the recognition of the fact, that one I/O package, format or storage system does not fit all requirements. The other point is the separation of user application specific data models from I/O specific and data flow optimization techniques. In order to address the first point, the design and implementation of SHORE follows a plugin pattern and thus allows the integration of a variety of underlying storage and I/O systems. The second point is resolved by designing SHORE to act as a real middleware framework with clear interfaces, which allows to separate the technical implementation from the domain expertise. To verify SHORE in real world use cases, it has been combined with existing systems and used to run actual large scale reduction jobs. From the perspective of user applications, the Casacore Table Data System is used again, while for data flow management, SHORE is integrated with Daliuge, which is the execution framework prototype of the SKA Science Data Processor. Detailed plans have been made for using SHORE to solve real-world problems occurring in some latest pre-SKA radio telescope projects, as well as for full-scale SKA Phase 1 simulation on the world's top supercomputers.

Until August 2016, Jason (Ruonan) Wang was a PhD student at the International Centre for Radio Astronomy Research at The University of Western Australia, working on the storage system design of the future world's biggest radio telescope, the Square Kilometre Array, and relevant research topics. His research interests include I/O middleware architectures, large scale parallel data I/O techniques, and their applications on real-world projects to achieve intelligent and real-time data flow optimizations. He is a candidate for a postdoctoral position in the Scientific Data Group.

September 29, 2016 - CJ Newburn: A Path Forward to Scaled Execution

Abstract:Finding a path forward to scaled execution is a key challenge for both legacy and new applications. Hardware and software features and abstractions can have a significant impact on how much performance can be achieved, and how easily it can be attained. This talk lays out a vision for how the CUDA Platform supports scaling, and how NVIDIA seeks to make scaling easier and more effective, creating opportunities to both achieve Exascale and to bring along less-fully tuned apps to Petascale levels. It will cover new features in both hardware and software, and point forward in some directions that we have yet to grow.

CJ Newburn drives the HPC strategy, roadmap and customer engagements for the Compute Software part of NVIDIA. He's been active in the supercomputing community as a SW and HW Architect for the last 20 years. He has a broad systems background that spans programming models, compilers, runtime systems, ISA features, microarchitecture and performance tuning.

September 23, 2016 - Dr. Mehmet Belviranli: Increasing Resource Utilization in Heterogeneous Architectures

Abstract:Heterogeneous systems employ accelerators (e.g. GPU, FPGA and many-core) to offer massive parallelism. Scientific applications benefit from these systems by offloading their data-parallel regions to accelerators. The heterogeneity across these architecturally diverse computation units, however, may introduce many challenges to software design. Better multi-processor utilization and efficient data transfers are among the most important objectives that application developers target in order to maximize the total system throughput.

In this talk, I will introduce both accelerator-level and system-wide techniques to improve utilization of various resources in heterogeneous architectures. In the first part, I will describe a new synchronization technique that replaces global barriers in wavefront parallelism to reduce SM idleness in GPUs. In the second part, I will present a novel scheduler for efficient data transfer and execution overlapping to increase accelerator and interconnect utilization.

The proposed studies demonstrate significant speedups over traditional heterogeneous programming techniques like fork/join type of parallelism and the default pipelined data transfers. Our techniques can be leveraged further to obtain higher execution efficiency for a wide range of scientific applications on larger scale computing platforms.

Mehmet Esat Belviranli has recently received his Ph.D. degree in Computer Science from University of California, Riverside under the supervision of Prof. Laxmi Bhuyan. His research interests are on systems. Particularly, he focuses on developing run-time solutions and scheduling algorithms to increase multi-processor utilization on heterogeneous architectures. He has authored multiple papers in ICS and also published in PACT and TACO. He received his M.S. and B.S. in Computer Science from Bilkent University, Turkey.

September 12, 2016 - Dr. Jeremiah Willcock: Abstractions for Parallel Graph Algorithms

Abstract:Operations on large graphs have become more important in the recent "big data" era, and in-memory computations on large-scale parallel systems are a good way to get performance on these algorithms. However, implementing these algorithms on each system type to get optimal performance is time-consuming and so a method to get code reuse between algorithms and systems while still achieving high performance would be beneficial to the computing community. Generic and generative programming techniques can achieve these properties using existing programming languages. This talk will describe several software libraries and programming models, primarily implemented in C++ and at a variety of levels of abstraction, for implementing graph algorithms to run efficiently on various types of parallel computers. Distributed memory is a particular focus of attention in this work, although other forms of parallelism will also be discussed.

Until August of this year, Jeremiah Willcock was a System Software Engineer at Micron Technology, working on developing and optimizing software for a novel, massively parallel hardware architecture. He received his Ph.D. in Computer Science from Indiana University in 2007, advised by Andrew Lumsdaine. His research interests include high-performance computing, especially mapping applications and algorithms to advanced parallel architectures, and secondarily, the design of abstractions and optimized implementations for parallel algorithms, especially graph and other irregular algorithms.

September 9, 2016 - Dr. Joel E. Denny: NVL-C: Static Analysis Techniques for Efficient, Correct Programming of Non-Volatile Main Memory Systems

Abstract:Computer architecture experts expect that non-volatile memory (NVM) hierarchies will play a more significant role in future systems including mobile, enterprise, and HPC architectures. With this expectation in mind, we present NVL-C: a novel programming system that facilitates the efficient and correct programming of NVM main memory systems. The NVL-C programming abstraction extends C with a small set of intuitive language features that target NVM main memory, and can be combined directly with traditional C memory model features for DRAM. We have designed these new features to enable compiler analyses and run-time checks that can improve performance and guard against a number of subtle programming errors, which, when left uncorrected, can corrupt NVM-stored data. Moreover, to enable recovery of data across application or system failures, these NVL-C features include a flexible directive for specifying NVM transactions. So that our implementation might be extended to other compiler front ends and languages, the majority of our compiler analyses are implemented in an extended version of LLVM's intermediate representation (LLVM IR). We evaluate NVL-C on a number of applications to show its flexibility, performance, and correctness.

Joel E. Denny received his Ph.D. in Computer Science in 2010 from Clemson University. After his Ph.D., he worked as a compiler engineer in industry. Two years ago, he joined the Joint Institute for Computational Sciences with the University of Tennessee and ORNL. Since then, he has worked on various LLVM-based compiler research projects in the Future Technologies Group. Currently, his research focuses on compiler techniques for programming non-volatile memory.

August 29, 2016 - Professor Hank Childs: In Situ at the Exascale: Motivations, Instantiations, and Opportunities

Abstract: In situ processing is likely to play a prominent role in visualizing and analyzing data on exascale platforms. In this presentation, I will cover three major themes. First, I will discuss the trends on HPC platforms that make in situ processing increasingly attractive compared to post hoc processing. Second, I will discuss the diversity of approaches that all fall within the in situ theme, with respect to axes such as data access, proximity, division of resources, etc. Third, I will discuss the opportunities inherent to in situ processing, and why this paradigm may allow us to do better science than we could previously.

August 8, 2016 - Dr.-Ing. Bernd Mohr: Multicore Performance Analysis at Scale: From Single Node to one Million HPC Cores

Abstract: Current HPC systems consist of complex configurations of potentially heterogeneous components. In addition, the hard- and software configuration can change dynamically due to fault recovering processes or power saving efforts. Deep hierarchies of large, complex software components are needed to operate them. Developing efficient and high performance application software for these systems is challenging. Therefore, sophisticated performance measurement and analysis capabilities are required.

After introducing Jülich Supercomputing Centre and its systems, the talk will present the performance measurement and analysis tools Score-P, Scalasca and Cube developed and maintained at Jülich Supercomputing Centre, one of the leading HPC computing centres hosting Europe's most parallel machine, a 458,752 core IBM BlueGene/Q. Finally, some industrial and research HPC analysis use cases are presented.

August 4, 2016 - Dr. Fabien Delalondre: Challenges and possible workflow, infrastructure and computing solutions to support multilevel brain modeling within the next decade

Abstract: The Blue Brain Project (BBP), project of the Ecole Polytechnique Federale de Lausanne (EPFL), implements a complex scientific workflow aiming at reconstructing digital models of brain tissue at multiple scales. Such a workflow includes a stack made of 50+ software having sometimes very different and evolving functional and performance requirements. In this talk we will present the challenges we face and the solutions we either implemented or are investigating in order to continuously build, simulate, analyze and visualize brain models using different levels of representation. This includes developing solutions to (a) Make scientists more productive: Structuring interactions between scientists and developers, developing solutions to ensure full scientific software reproducibility, defining development/execution workflows and associated tools, (b) Design an heterogeneous distributed infrastructure along with services to best support the scientific needs that include support for: Web applications and HPC, Cloud and Volunteer computing resources and (c) Build efficient scientific applications at scale through the development and usage of: performance portable solutions (DSL, Code abstraction and programming models including pragma directives and intrinsics, Performance models), solutions/interfaces leveraging deep memory hierarchy including non-volatile memory and the joint development of run-times (OmpSs, hStreams, HPX). Finally we will conclude this talk by giving our perspective on the critical items needed towards the development and use of Petascale and Exascale computers. With this talk we hope to expose the types of problems the Blue Brain Project is trying to tackle and hopefully initiate some fruitful discussions about the possible ways to best approach them over the next decade both at Petascale and Exascale.

Bio: Fabien Delalondre received his PhD in Computational Mechanics from Mines ParisTech in 2007. At CEMEF - the research centre of Mines ParisTech - he worked on the development of parallel adaptive numerical methods to support the modeling of the Adiabatic Shear Band phenomenon that occurs in high speed machining. In 2008, he joined the Scientific Computation Research Center (SCOREC) in Rensselaer Polytechnic Institute, New York, USA; first as a Postdoctoral Research Associate and later on, as a Senior Research Associate. At SCOREC, he worked on the development of parallel adaptive methods and multiscale simulation software to be executed on massively parallel computers. His work formed part of the Interoperable Technologies for Advanced Petascale Simulations (ITAPS) project funded by the American Department of Energy. This project focused on developing common interfaces between various large-scale parallel scientific software components. He joined the Blue Brain Project (BBP) in 2011, first as a Multiscale Simulation Architect, before he was promoted to Manager of the High Performance Computing group. At the BBP, he is in charge of leading a group of researchers focused on developing HPC/HTC middleware and application software, leading and implementing the BBP's HPC roadmap for Petascale/Exascale transition, contributing to the Dynamic Exascale Entry Platform (DEEP) EU project, the Argonne Leadership Computing Facility (ALCF) Theta Early Science Program and to the BBP software development strategy. His interests include parallel computing, numerical methods and multiscale modelling, error estimation and adaptation, software engineering, volunteer, cloud and HPC system, and co-design.

August 4, 2016 - Dr. Jerry Chow and Dr. Jay Gambetta: Quantum error detection, high-fidelity control, and experiencing other things quantum

Abstract: Fault tolerant quantum computing is possible by employing quantum error correction techniques. In this talk we will describe an implementation of various small quantum codes using lithographically defined superconducting qubits in latticed arrangements. These codes explore a new area of quantum information processing, including the detection of full quantum errors and the encoding of a logical qubit. Our experiments require highly coherent qubits, high quality quantum operations implementing the detecting circuit, and high quality independent qubit measurements. Looking beyond further, there remains both theoretical and experimental control hurdles which must be overcome to build verifiably reliable quantum networks of qubits. We will present some experiments which point towards these important questions and give proposals for future integration capability, measurement integration, and scalable control architectures. The focus will be on a variety of questions which will increasingly become important as the field moves towards a larger network of qubits.

Bios: Jay Gambetta is manager of the Theory of Quantum Computing and Information Group at the IBM T. J. Watson Research Center in Yorktown Heights, NY. Jay earned his Ph.D. in Physics, specifically quantum information and quantum optics, at Griffith University, Australia in 2004. He was a postdoctoral associate at Yale, a CIFAR junior fellow then Research Assistant Professor at the University of Waterloo before he came to IBM. Jay was named an APS Fellow in 2014 for his seminal theoretical work on quantum information processing with superconducting qubits.

Jerry Chow is manager of the Experimental Quantum Computing group at the IBM T.J. Watson Research Center in Yorktown Heights, NY. Jerry earned his Ph.D. in Physics from Yale in 2010; his thesis work explored superconducting qubits for quantum computing. Since coming to IBM he has led the team that is working advancing superconducting qubit technology with the goal of demonstrating the first practical universal quantum computer. Jerry was selected as one of the technologists on Forbes' "30 under 30" list in 2012.

July 28, 2016 - Camila A. Ramirez: Implementation of SIR particle filter for radiation source detection

Abstract: Particle filtering is a methodology for sequential signal processing with a wide range of applications. The underlying principle of this method is an approximation of a relevant distribution with random measures composed of particles and their weights. Based on the concept of sequential importance sampling and the use of Bayesian inference, it has been applied in a number of nonlinear and non-Gaussian estimation problems, in particular, radiation source detection. In this talk, we present a Sampling Importance Resampling (SIR) filter to detect a single radiation source using measurements from a network of sensors. We employ four different resampling methods where the particles are resampled once, linearly, quadratically and exponentially. These methods eliminate particles with small weights and replicate ones with large weights at different rates to fine-tune the SIR filter. We apply this particle filter to experimental data sets from Intelligent Radiation Sensor Systems (IRSS) program to assess its effectiveness.

July 26, 2016 - Dr. Andrey Prokopenko: Developing an algebraic multigrid software package to help applications

Abstract: Many complex applications rely heavily on algebraic multigrid methods (AMG) as part of the solution process. In this talk, we describe approaches to help computational fluid dynamics (CFD) and magnetohydrodynamics (MHD) simulations in three different areas using a C++ multigrid package, MueLu (part of Trilinos), developed at Sandia National Laboratories.

The first area is the development of new methods tailored to such applications. We describe a new multigrid method for mixed discretizations of Navier-Stokes equations for the case where unknowns are not co-located at mesh points. The main idea is to first automatically define coarse pressures in a somewhat standard AMG fashion and then to automatically choose coarse velocity unknowns so that the spatial location relationship between pressure and velocity degrees-of-freedom (dof) resembles that on the finest grid.

The second area is improving application performance through multigrid components reuse. Nonlinear solvers for each transient step often produce a sequence of closely related systems. We will concentrate on speeding up the AMG solver through reuse of components of the multigrid hierarchy throughout the nonlinear solve, and possibly across transient steps. The goal is to
significantly reduce the multigrid hierarchy setup time while not significantly hurting the convergence. We will consider several AMG reuse strategies that differ in the amount of reused data.

Finally, in the third area, we will describe few low-level optimizations developed in MueLu to speed up the multigrid hierarchy construction at large scale.

BIO: Dr. Andrey Prokopenko joined Sandia National Laboratories as a Postdoctoral Associate in 2012. He completed a MS in Mathematics at the Moscow State University in Russia in 2006, and a PhD in Applied Mathematics at the University of Houston in 2011, after which he spent a year there as a postdoctoral research assistant. His research is focused on scalable numerical algorithms with a focus on linear algebra and multigrid methods in particular. A significant portion of his research involves development of a C++ multigrid library MueLu (part of Trilinos project) geared towards exascale architectures.

July 21, 2016 - Dr. Peter M. Kogge: Introducing Emu Solutions' Migratory Thread and Memory-Side Processing Technology

Abstract: There is growing evidence that current architectures do not well handle cache-unfriendly applications such as sparse matrix operations, data analytics, and graph algorithms. This is due, in part, to the irregular memory access patterns demonstrated by these applications, and in how remote memory accesses are handled. This talk introduces a new, highly-scalable PGAS memory-centric system architecture where migrating threads travel to the data they access. A multi-threaded programming model based on Cilk has proven a highly efficient match to both these new applications and the Emu architecture. The first implementation of this architecture is discussed. Comparison of key parameters with a variety of today's systems, with different system architectures, demonstrates the advantages. Early projections of performance against several well-documented kernels translate these advantages into comparative numbers. The next generation implementations of this architecture will expand these performance advantages very significantly.

Bio: Dr. Kogge is a Chaired Professor in Notre Dame's Department of Computer Science and Engineering. He is an IBM Fellow and was awarded the 2012 Seymour Cray Award and the 2015 Gauss Award, among other awards. Prior to academia, Dr. Kogge spent 26 yrs. with IBM research. Dr. Kogge's undergraduate degree is from Notre Dame and he has a Ph.D. from Stanford in Electrical Engineering.

July 14, 2016 - Dr. Faisal Shah Khan: Non-cooperative Games and Quantum Computing

Abstract: Non-cooperative game theory and quantum computation first appeared together in the novel works of David Meyer and Jens Eisert et al. around 1999. Meyer sought insight into the efficiency of quantum algorithms by viewing quantum computations as non-cooperative games. On the other hand, Eisert et al. applied quantum information to simple non-cooperative games by modelling these games as two qubit quantum computations. Over the years, these two perspectives have come to define the notion of "quantum games", and have produced scientific results the merits of which to quantum physics have sometimes been questioned, especially in cases where the two defining perspective are not delineated. Casual use of the term "game" to discuss quantum information protocols has also added to the confusion as to where in the world of quantum physics does game theory belong. In this talk, I will delineate the two perspectives to quantum games and discuss the merits of applying non-cooperative game theory to quantum computations and other quantum information process.

Bio: Dr. Faisal Shah Khan has a PhD in Mathematical Sciences from Portland State University, Oregon. He currently serves as an Assistant Professor of Mathematics and an affiliate faculty member of the Information Security Research Center at Khalifa University, Abu Dhabi. His primary research interest is in identifying optimal performance of quantum information processes under constraints using non-cooperative game models. He is also interested in post-quantum cryptography, quantum computation and its impact on big data analytics, quantum computational models for chemistry, and more esoterically, categorical reasoning in quantum mechanics and information processing. Dr. Khan also serves as an undergraduate advisor and has been involved in the development of the undergraduate degree program in applied mathematics at Khalifa University.

July 13, 2016 - Dr. Ryan Bond: Extension of Kestrel to General Thermochemical Models

Abstract: This seminar will cover some preliminary activities associated with the extension of the HPCMP CREATETM-AV Kestrel computational fluid dynamics software toward thermochemical generality. Components of the Kestrel software currently carry assumptions of a perfect gas. Facilities modeling needs at Arnold Engineering Development Complex (AEDC) require thermochemical effects associated with higher temperatures, higher pressures, and chemical reactions. The initial focus of the effort is on meeting simulation needs for AEDC facilities. This effort is being coordinated with longer-term planning to meet needs of the broader Kestrel user base for simulations involving higher temperatures, chemical reactions, and mixing.

Bio: Dr. Ryan Bond is currently a senior engineer at Arnold Engineering Development Complex (AEDC) working in software and algorithms development for reacting, compressible computational fluid dynamics (CFD). Prior to joining AEDC, Dr. Bond spent 12 years at Sandia National Laboratories in New Mexico, 5 years as manager of the Computational Thermal and Fluid Mechanics Department preceded by 7 years as an R&D staff member in the Aerosciences Department. Dr. Bond holds a B.S. in Aerospace Engineering and B.S. in Mathematics from Mississippi State University, an M.S. and Ph.D. in Aerospace Engineering from North Carolina State University, and an M.B.A. from the University of New Mexico.

May 27, 2016 - Jai Dayal: Middleware for Managing Large Scale In Situ Science Analysis Workflows

Abstract: In situ analytics are becoming an important technique to analyze and process the immense volumes of data science applications produce. In this talk, I will present my research on Flexpath, a middleware to enable data movement across in situ workflow components, and SODA (Science-driven Orchestration of Data Analytics), a management middleware to ensure the workflow sustains appropriate quality of service, quality of data, and resilience. Both pieces of middleware have been developed within the ADIOS (Adaptive Input/Output System) framework and have been evaluated with several real science applications and representative analysis workflows.

Bio: Jai Dayal is a PhD candidate in the School of Computer Science at Georgia Institute of Technology under the advisement of Dr. Matthew Wolf and Professor Karsten Schwan. His research focuses on developing and applying distributed systems principles to address performance and resource management concerns for large science applications and workflows.

May 5, 2016 - Jean-Phillipe A. Thomas: Modeling and prevention of coarse microstructural features in aerospace alloys

Abstract: The conversion of ingots of nickel- and titanium-based alloys into products suitable for final forging or machining of aerospace components involves a series of hot deformations and reheats. This presentation will discuss how processes are designed to meet specifications that often include not only average grain size requirements, but also bounds to what defines a homogeneous microstructure.  Maximum allowed values may be specified for the fraction of unrecrystallized grains or for the so-called ALA (As Large As) grain size for instance. To ensure that specifications are met consistently, aerospace forging quality relies on three principles: 1) manufacturing equipment is regularly tested and surveyed, 2) processes are approved through extensive qualification programs, and 3) material products are tested through a combination of metallographic, mechanical, and non-destructive techniques. Process simulation and microstructure-evolution modeling play a critical role in industry to design robust processes that guarantee a good balance between the pace of microstructure transformation and the kinetics of microstructure coarsening for alloys such as 718 and Ti‑6‑4. They help minimizing the cost of quality by meeting specifications with close-to-optimal processes that will require minimal changes and re-qualifications. Details will be provided about such mesoscale and topological modeling approaches applied for recrystallization and grain growth. The presentation will then touch upon the shift in microstructure requirements from refining the whole grain size distribution to an effort to reduce the coarsest grains without affecting average grain sizes as they provide an optimal balance of strength and creep resistance. We will use examples to illustrate the complexity of determining the impact of combinations of slight process variations inevitably observed in industrial environment. It will lead us to present an approach to embed process and microstructural models in industrial scale, yet physics-based analytics software. We will then review literature examples that propose further applications for such tools. Finally, we will discuss how the Quality principles mentioned above will have to be expanded to include not only equipment, process, and testing, but also analytics tools as they are expected to play a growing part in the effort to detect or minimize unfavorable variations and prevent coarse features.

Bio: Jean-Philippe (JP) Thomas is an expert in the characterization and modeling of microstructure evolution in nickel- and titanium-based alloys manufactured through the cast-and-wrought route, more recently expanding into powder metallurgy processes as well. JP started focusing on these topics when he was a student at the Ecole Nationale Supérieure des Mines (National Graduate School of Mines) of Saint‑Etienne in France, where he received a MS and PhD in Materials Science and Engineering in 2000 and 2005. From his earliest exposure in this field during internships and graduate research work with Aubert & Duval, he learned to always go beyond "contemplative metallurgy" by leveraging experimental data to develop metallurgical models. With the guidance of his advisor Professor Frank Montheillet, JP moved quickly from phenomenological to physics-based models. He kept on this approach when he joined Lee Semiatin's team as a Research Scientist at the Air Force Research Laboratory in 2004, and later ATI Allvac (now ATI Specialty Materials) as a Senior Engineer in 2007. From 2012, as Principal Modeler for ATI, JP has been looking to utilize manufacturing data as input for process and metallurgical models embedded in industrial scale, yet physics-based, data analytics software.

May 2, 2016 - Matthew Wolf: Managing Complex Workflows at Scale and In Situ

Abstract: As we look forward to the next several generations of extreme scale hardware, it is clear that the Input/Output bottleneck will cause some significant changes in the way that computations think about the management of their data flows. Simply writing out the entire working state of a simulation won't scale for many applications. This has led to a resurgence of interest in techniques that today are labeled "in situ" or "staging," but that are directly related to earlier work in steering, concurrent processing, and stream processing. In this talk, I will explore several use cases for in situ analytics and how they have shaped runtime innovations for deployment and management of data streams. These runtime changes allow us to structure workflows over multiple nodes, within a single node, or even directly timeshared on cores or accelerators so that the total end-to-end time can be managed to best effect for the scientist.

April 29, 2016 - Eugene Dumitrescu: Error detection for reliable direct process tomography and channel estimation

Abstract: Experimentally determining the dynamics of qubits is a necessary crucial step in improving current and future quantum devices. Quantum process tomography (QPT) is a collection of protocols allowing one to experimentally determine the dynamics of a quantum system. Key obstacles prohibiting successful application of QPT protocols are noisy quantum dynamics (including state preparation and measurement errors) and the exponential resource cost for high-dimensional systems. In this seminar I discuss two aspects of error correction in quantum process tomography: (i) small error detecting codes which can increase the reliability of experimentally obtained quantum information and (ii) the utility of partial direct characterization in channel estimation and discrimination to overcome the multipartite process exponential resource cost.

Bio: Dr. Eugene Dumitrescu is an Intelligence Community Postdoctoral Research Fellow at the University of Tennessee and member of the Quantum Computing Institute at Oak Ridge National Laboratory. His research focuses on quantum tomography, error correction, coding theory, and topological physics in condensed matter. Eugene holds a bachelor's degree in physics from The University of North Carolina at Chapel Hill and a doctorate in physics from the Clemson University.

April 28, 2016 - Vivek Mishra: Manifestation of Incipient Band Superconductivity on Neutron Resonance

Abstract: Recent ARPES (Angle Resolved Photo-Emission Spectroscopy) measurements have reported large superconducting gaps on the incipient bands in the iron based superconductors, which do not cross the Fermi energy. Most of these systems have transition temperatures in the range 15-100K. These results have motivated several theoretical studies, which suggest that spin-fluctuation mediated pairing can be a possible mechanism of superconductivity. In this talk, I will discuss the spin fluctuation excitation spectrum in these materials. I will show that a s_+- superconducting state gives rise to a neutron resonance, and a breakdown of linear scaling between the superconducting gap and this neutron resonance energy is a hallmark of incipient band s_+- superconductivity.

Bio: Vivek Mishra is a postdoctoral associate at the Joint Institute for Computational Sciences (JICS), University of Tennessee. He received his M. Sc. (Integrated) in physics from the Indian Institute of Technology, Kanpur, India, and his Ph. D. from the University of Florida, Gainesville, USA. Before joining JICS, he worked at the Argonne National Laboratory. He is interested in theoretical condensed matter physics.

April 13, 2016 - Michael Milinkovich: Open Source Is Mainstream

Abstract: In the past decade, open source software licensing and development have gone from a curious new way of doing things, to dominating the software industry. The next decade will see that success move to where open source will become the dominant approach in those industries where software is becoming an increasing part of the value chain. Which is to say: all industries.

The Eclipse Foundation plays a leading role in the open source community, and has a particular emphasis on creating high-quality code that can be used by government and industry in real-world products. This talk will provide an overview of what open source is, is not, and how it is practiced by the Eclipse community to deliver leading products in developer tools, science, geospatial, and the internet of things.

BIOGRAPHICAL INFORMATION:  Mike Milinkovich is the current Executive Director of the Eclipse Foundation.  Since April 1, 2012, he has served on the board of the Open Source Initiative. He is a former Vice President for Oracle as well as WebGain. In the past, he served as the Strategy Manager for IBM's Visual Age IDE. Visual Age is the ancestor of Eclipse.

Milinkovich is known for advocating OSGi as a solution for modularity in the Java programming language.  Milinkovich is a highly recognized figure in the Java community and is a nominee for Top Java Ambassador.  He is also known for advocacy against software patents.

April 7, 2016 - Greg Dreifus: The Way Forward in Additive Manufacturing Tool Path Optimizations

Abstract: Tool path optimization in fused deposition modeling (FDM) is an area ripe for research. An objective function to optimize the trajectory of the 3D printer head can rely on numerous parameters, many of which are poorly understood in additive manufacturing in general, and true optimization is constrained by the underdeveloped nature of AM software.  Optimizing for time requires a geometric analysis of a given 3D printed part, but much of the geometric information inherent in a CAD file is lost in the file stream to Slicing software.  Furthermore, optimizing based on the mechanical constraints of the 3D printing process is limited by the lack of models for polymer printers in the field. T his seminar will explore the tools that will be needed for a thorough FDM optimization, providing an analysis of what information is most crucial to optimize 3D printers in a feasible way, and how the software and hardware in AM should move forward to enable optimization as well.

BIO: Greg Dreifus received his BA/MA from the Macaulay Honors College at Hunter College, CUNY in June 2015 with a degree in Applied Mathematics.  He will begin graduate studies leading to a Ph.D. in mechanical engineering at the Massachusetts Institute of Technology this fall.  He is now an ASTRO intern at ORNL's Manufacturing Demonstration Facility (MDF), where he works in the area of tool path optimization for additive manufacturing, for the MDF's big area additive manufacturing (BAAM) and fused deposition modeling systems in particular.

April 1, 2016 - Siddharta Santra: Towards Practical Quantum Computing: Lessons from Quantum Annealing on Dwave

Abstract: The Dwave quantum annealing processor provides a physical realization of a non-universal form of adiabatic quantum computation that can still be useful for solving certain computationally-hard practical problems. Although, at current sizes, its advantages are not spectacular relative to classical algorithms it provides an interesting platform to test and study different aspects of quantum computation such as the role of entanglement, robustness of adiabatic computation and quantum error correction strategies. While the search for the class of useful problems that will provably be better suited for quantum annealing is still on - we can attempt to develop some intuition about the ingredients in the recipe for large-scale quantum computation in the future. In this talk, I will introduce quantum annealing (QA) as a finite temperature non-universal form of adiabatic quantum computation (AQC). After describing QA with Dwave, I will describe two different problems - MAX-2-SAT and Associative memory recall - that we have implemented on the machine. Then I will discuss the some outstanding questions such as entanglement and the possibility of speed-up. Finally, I will conclude with some lessons that we can draw at this stage about practical large-scale quantum computing.

April 1, 2016 - Lingfei (Teddy) Wu: Algorithms and Software for Large Scale Problems in SVD Computations and in Big Data Applications

Abstract: As "big data" has increasing influence on our daily life and research activities, it poses significant challenges on various research areas. Some applications often demand a fast solution of large, sparse eigenvalue, and singular value problems. In other applications, extracting knowledge from large-scale data requires many techniques such as statistical calculations, data mining, and high performance computing. In this talk, I will introduce my research efforts addressing these challenges, including developing efficient algorithms and high-performance software tools to cope with large-scale problems running on extremely large parallel machines. I will first talk about developing a high-performance state-of-the-art SVD (Singular Value Decomposition) software running both on desktop machines and supercomputers. Then I will talk about my work in LBNL for coping with fusion plasma big data. Next I will briefly talk about leveraging practical numerical and data mining techniques to estimate the trace of a function of a large, sparse matrix. In addition, this talk will briefly outline my intern work in IBM for scaling up large-scale kernel machines via randomized features in speech recognition.

Bio: Lingfei Wu is a 6th year Ph.D. candidate in the computer science department at College of William and Mary, advised by Dr. Andreas Stathopoulos. His research interests are in the areas of high-performance scientific computing, large-scale machine learning, and big data analytics. In summer 2014, Lingfei was a computing sciences summer student at Lawrence Berkeley National Laboratory. In summer 2015, Lingfei was a summer research intern at IBM T. J. Watson Research Center. Before joining William and Mary, Lingfei received his M.S. from University of Science and Technology of China (Hefei, 2010), following his B.E. from Auhui University (Hefei, 2007).

March 31, 2016 - Robert Hoy: Effect of Chain Stiffness on the Competition between Crystallization and Glass-Formation in Model Polymers

Abstract: I will discuss recent efforts to capture essential aspects of the competition between crystallization and glass-formation in polymer liquids through coarse-grained modeling. Progress towards this goal has been achieved through development of a semiflexible "soft pearl-necklace" polymer model wherein polymers are represented as soft tangent spheres connected by harmonic bonds and variable-strength angular interactions favoring straight chains. We have mapped out the solid-state morphologies formed by these polymers as a function of chain stiffness, spanning the range from fully flexible to rodlike chains. In the flexible limit, monomers occupy the sites of close-packed crystallites while chains retain random-walk-like order. In the rodlike limit, nematic chain ordering typical of lamellar precursors coexists with close-packing. At intermediate values of bending stiffness, the competition between random-walk-like and nematic chain ordering produces glass-formation; the range of bending stiffnesses k_b over which this occurs increases with the thermal cooling rate implemented in our molecular dynamics simulations. Values of k_b between the glass-forming and rodlike ranges produce complex ordered phases such as close-packed spirals. Current work focuses on extending this picture by comparing the liquid-state dynamics for different k_b, in terms of local cluster-level structure. Such analyses show several common features with recent analyses of colloidal glass-formers.

Bio: Robert Hoy obtained his bachelor's and doctoral degrees in Physics from Johns Hopkins University, doing doctoral work under Mark Robbins. After postdoctoral appointments at UC Santa Barbara and Yale, he joined USF as an assistant professor in 2012. He recently received an NSF CAREER award funding his planned research in glassy polymer mechanics.

March 17, 2016 - Song Jiang: Design and Implementation of Effective Key-Value Systems for Large-scale Data Centers

Abstract: Data management systems in large-scale data centers are designed for high performance, scalability, and reliability. They play important roles in supporting Internet-wide data-centric computing. An important design principle critical to their success is to design according to workload characteristics: the general-purpose, one-size-fits-all approach once used in small-scale systems is no longer cost-effective. Examples of modern, carefully engineered systems include Google's GFS file system, Facebook's Haystack photo storage, and Baidu's Atlas cloud storage system.

In this talk we will describe how rigorous workload characterization is used to design and implement a key-value (KV) system for large-scale data centers. In collaboration with Facebook, our team collected week-long KV access traces from Facebook's production Memcached system and systematically characterized the relevant workload characteristics. This study showed some distinct access patterns that have significant implications for the KV systems' designs, such as that (1) very small KV items are widespread; (2) accesses are highly skewed towards a small set of hot keys in KV cache; and (3) access traffic can be highly dynamic with request traffic varying by a factor of two.

Using our understanding of real-world workloads we designed and implemented the high-performance and resource-efficient zExpander KV cache and the LSM-trie KV store systems. We will detail how the two systems' designs were motivated by the understanding of their targeted workloads. Evaluation results reveal substantially, sometimes dramatically, improved performance over other state-of-the-art systems. As an anecdotal example, the LSM-trie system can improve the read and write throughputs of Google's LevelDB by up to 10 and 20 times, respectively. We will conclude with a brief overview of our on-going projects and future visions.

Bio: Dr. Song Jiang is currently an associate professor of the ECE department at Wayne State University. His research interests include system infrastructure for big data processing, such as file and storage systems and data management systems, as well as I/O systems for high-performance computing. He was a recipient of a 2009 US National Science Foundation (NSF) CAREER award and his research activities have been continuously supported by the NSF. He has served on numerous conference program committees and proposal review panels. He has been involved in projects at Facebook and Baidu as a collaborator for providing high-quality Internet-wide services based on big data, resulting in many significant publications at top-tier conferences. Dr. Jiang's research has generated substantial impact in industry where several of his proposed algorithms for memory and storage management have been officially adopted into mainstream systems including the Linux kernel, the NetBSD kernel, and the storage engine of MySQL.

He received his B.S and M.S from the University of Science and Technology of China, and his Ph.D. in computer science from the College of William and Mary in 2004. From 2004 to 2006 he was a post-doctoral researcher at the Los Alamos National Laboratory where his research work was cited at the national level as a "success story" of the NNSA Laboratory Directed Research and Development program.

More information about his research can be found at

March 16, 2016 - Damien LeBrun-Grandié: Modeling of the electrical, electrochemical, and thermal processes in supercapacitors

Abstract: Supercapacitors are high-capacity electrochemical capacitors that bridge the gap between rechargeable batteries and conventional double-layer capacitors. They typically store 10 to 100 times more energy per unit volume or mass than conventional capacitors, and are able to accept and deliver charge considerably faster than batteries, while offering a much longer lifetime. Dr. LeBrun-Grandie will present their basic design and charge storage principles, and develop a simplified multidimensional and multiphysics model for the electrical, electrochemical, and thermal processes. The implementation of this model in a new library for modeling energy storage device will be presented. Several results from simulations of supercapacitors subject to typical charge and discharge profiles will be given and strategies to compare models and validate them against experimental data will be discussed.

BIOGRAPHICAL INFORMATION: Damien LeBrun-Grandié a Postdoctoral Research Associate in the Computational Engineering and Energy Sciences Group of the Computer Science and Mathematics Division of ORNL. He received his Ph.D. from Texas A&M University and currently works on modeling energy storage systems.

March 15, 2016 - Patrick Bridges: Integrating Performance Modeling into Scalable System Software Design

Abstract: Developing new scalable system software techniques is essential to the success of emerging large-scale scientific computing systems due to the increasing scale and complexity of hardware, programming systems, and applications. In particular, HPC operating systems and middleware must address challenges in areas such as fault tolerance, scheduling, synchronization, power management, and high-speed communication. Interactions between these areas also complicate software design; recent research has shown, for example, that both power capping and asynchronous checkpointing can have widely varying and hard-to predict impacts on system performance.

Because of these challenges, my research has increasing relied on performance modeling to expose research challenges, quantify performance tradeoffs, and evaluate the resulting system. This aspect of the research is challenging and rewarding because it requires understanding the underlying system, the strengths and limitations of different modeling approaches developed by the modeling community, and how to best integrate these techniques into system software design. In some cases, my students and I have been able to use simple analytical models; recently, however, we have recently been relying on more sophisticated stochastic modeling techniques. We have also begun exploring the viability of using large-scale computational models to inform the design of HPC system software.

In this talk, I discuss several systems research projects my students and I have conducted to meet HPC system software challenges in the areas of resilience, scheduling, and communication system design. In each of these areas, I describe both the research itself and how modeling techniques have informed the research. Finally, I will briefly discuss some new research directions we are currently exploring as well as provide some thoughts on the broader integration of modeling and evaluation in computer systems research and education.

Bio: Patrick G. Bridges is an Associate Professor and Associate Department Chair of the Computer Science Department at the University of New Mexico. His research focuses on system software for large scale computer systems, including research on operating system design and implmentation, communication system optimization, and fault tolerance and resilience. Prof. Bridges received his Ph.D. from the University of Arizona in 2002 working under the direction of Prof. Rick Schlichting, and his B.S. from Mississippi State University in 1994. He is a member of the Association of Computing Machinery and the IEEE Computer Society.

March 14, 2016 - Samer Al-Kiswany: The Old Systems and the Sea (of Applications and Hardware Changes)

Abstract: Current systems—designed in an era of desktop and server applications and following decades-old design principles that are incongruous with today's data-center hardware capabilities—are inadequate to meet two challenges: capitalize on hardware evolution and efficiently support key applications. I address these challenges in turn. First, I present storage system solutions to better support key HPC and cloud applications through redesigning file systems to provide key properties required by these applications, such as per-file optimization or consistency preservation across crashes. Second, I propose a new approach for designing the next generation of distributed systems through co-designing the system operations and network support for higher efficiency, scalability, and performance. I demonstrate the efficacy of this approach by designing a key-value storage system that leverages the capabilities of software-defined networks.

Bio: Samer Al-Kiswany is an NSERC postdoctoral fellow at the University of Wisconsin, Madison. He obtained his master's degree and PhD from the University of British Columbia, Canada. His research interests are in distributed systems, high-performance computing, cloud computing, and operating systems. In particular, his work focuses on reconsidering systems designs in light of recent changes in cloud and HPC applications and platforms. He is a recipient of ten national and international awards, including the Killam Doctoral Fellowship, the NSERC Postdoctoral Fellowship, and the IEEE George Michael HPC Fellowship.

March 10, 2016 - Eric Suchyta: Digging Deeper (and More Greedily) in Imaging Surveys

Abstract: Accurate statistical measurement with large astronomical imaging surveys has traditionally required throwing away a sizable fraction of the data. This is because most measurements have relied on selecting high signal-to-noise samples, where variations in the properties of the galaxy ensemble with survey observing characteristics (such as sky brightness) are small.

We introduce a new measurement method that aims to minimize this wastage, allowing precision measurement for any class of stars or galaxies detectable in an imaging survey. We have implemented our proposal in Balrog, a software package which embeds fake objects in real imaging in order to accurately characterize measurement biases. I will present Balrog and some of its applications, as well as note how the methodology could be useful for extending the statistical reach of measurements in a wide variety of coming imaging surveys.

Bio: Eric Suchyta is a Postdoctoral Researcher at the University of Pennsylvania, working with the Dark Energy Survey Collaboration. He is involved primarily with the weak lensing, and large scale structure groups. He received his Ph. D. in 2015 from The Ohio State University, where he worked on several projects for the Dark Energy Survey, including instrument control software development, data reduction algorithms, and simulations.

March 3, 2016 - Weikuan Yu: Orchestrating a Cache-Memory Concert for Massive Parallelism

Abstract: There has been a rapid rise of massive parallelism in modern processors while the core count is expected to reach thousands in a decade. In contrast, memory bandwidth lags behind, causing an ever-growing gap between off-chip memory bandwidth and the cumulated computing power in a machine. The massive parallelism leads to a host of challenging issues. Particularly, it causes an explosion of recently accessed datasets with temporal locality of very short duration and spatial locality of different strides, frequently with memory accesses of unique striding patterns. This has led to serious challenges for conventional cache algorithms to achieve an effective use of limited cache capacity. In addition, the increasing width of massive parallelism leads to congested memory accesses in the memory pipeline, stalling the warp schedulers and degrading the performance. Hence, there is a critical need of a cache-memory concert that can orchestrate cache and memory management to meet the challenges of massive parallelism. This talk will present our recent research studies to orchestrate a cache-memory concert. First, we introduce a new cache indexing method that can adapt to memory accesses with different strides in this pattern, eliminate intra-warp associativity conflicts, and improves GPU cache performance. Then, we will present a divergence-aware Cache management that can orchestrate L1D cache management and warp scheduling together for GPGPUs. Finally, we will show the development of a cutting-edge warp-scheduling algorithm that can predict the resource demand of active warps, and throttle the consumption of Load-Store units for effective warp parallelism.

BIO: Dr. Weikuan Yu is an Associate Professor in the Department of Computer Science at Florida State University (FSU). He served as a Research Staff Member in the Future Technologies group at Oak Ridge National Laboratory until 2009, and then an assistant and associate professor at Auburn University until 2015. Dr. Yu has founded the Parallel Architecture and Systems Laboratory (PASL) at Auburn and FSU. His research interests include a multitude of technical areas including processor-memory architecture, big data analytics in social networks, high speed interconnects, cloud and distributed systems, storage and I/O systems. Many of Dr. Yu's graduate students have joined prestigious organizations such as Boeing, Amazon, IBM, Intel, Yahoo and governmental laboratories upon graduation.

March 2, 2016 - Geoffroy Vallee: Application-level fault tolerance using proposal from the MPI Forum's Fault Tolerance Working Group

Abstract: This presentation focuses on User-Level Failure Mitigation (ULFM), MPI Forum's Fault Tolerance Working Group's current proposal. This proposal includes MPI-level fault detection and notification, as well as capabilities to restore the MPI layer in order to guarantee correct point-to-point and collective communications despite the failure of MPI rank(s). In this presentation, the semantics of the proposed modifications to the MPI standard will be examined, as well as the description of a fault tolerant NAS Parallel Benchmark (NPB) that relies on ULFM.

February 11, 2016 - Daniel Steingart: Negotiating with Batteries

Abstract: Endeavors in electrochemical energy storage are industrial masochism for the same reason they are academic hedonism: a working, rechargeable battery represents a tight coupling of multiphase phenomena across chemical, electrical, thermal and mechanical domains. Despite these couplings, most treatments of batteries in the academic literature emphasize the material challenges and opportunities as opposed to the system level workings. There are at least three good reasons for this: 1) to date, tools for examining the structure of "real" cells in operando are largely limited to synchrotron x-ray and neutron methods, 2) full cells are products engineered for application demands and not platonic ideals and 3) material improvements can have enormous impact on battery performance.

Yet understanding and examining the physical dynamics of cells in a "scaled context" is still a worthwhile academic endeavor. The battery as a system presents problems that are difficult to decouple, but the study of such problems can introduce new opportunities and inform electrochemical reactor designs and material utilization strategies.

By studying full "scaled" cell behaviors we have learned how to compensate for certain material disadvantages and to create batteries and components that can meet performance targets which challenge traditional materials-first strategies. First, I will show that the "dendrite" may not be the universal anathema it is made out to be (at least in a water stable system). Second, I will show what we can learn from the many reasons it is difficult to cycle a traditional "bobbin" AA cell. Finally, I will examine a "stupid battery trick" unique to the zinc alkaline bobbin that can teach us something (perhaps) universally applicable to all closed batteries.

Biography: Dan Steingart is an assistant professor in Mechanical and Aerospace engineering and the Andlinger Center for Energy and the Environment at Princeton University. He has a Sc.B. in engineering from Brown University and M.S. and Ph.D. degrees in materials science from the University of California at Berkeley. His research is focused upon battery engineering at the intersection of materials science, diagnostics, and system design. Previous to his current appointment, he was an assistant professor in the Department of Chemical Engineering at the City College of New York, and a co-founder of Wireless Industrial Technologies.

February 4, 2016 - Jacek Jakowski: Quantum Methods for Temporal and Spatial Multiphysics of Nanomaterials

Abstract: Today's advanced materials are increasingly complex. The processing technology is heading towards the quantum scale, involving simultaneous manipulations of atoms, electrons and light. Simulation of complex behavior of nano-systems of realistic size molecular systems requires multiscale and multiphysics approaches with inexpensive electronic structure approaches such as tight-binding Density Functional Theory. We present our effort towards development of a comprehensive direct dynamics methods, encompassing time-dependent or independent electrons, approximate quantum and classical nuclei, compatible with inclusion of time-dependent external potentials.

January 25, 2016 - Lipeng Wan: Achieving High Reliability and Efficiency in Maintaining Large-scale Storage Systems through Optimal Resource Provisioning and Data Placement

Abstract: With the explosive increase in the amount of data being generated by various applications, large-scale distributed and parallel storage systems have become common data storage solutions and been widely deployed and utilized in both industry and academia. While these high performance storage systems significantly accelerate the data storage and retrieval, they also bring some critical issues in system maintenance and management. First, the number of physical devices used in large-scale storage systems has been increasing significantly, which leads to high data and system vulnerability, as the failure of any of these devices might cause data loss or make partial or entire storage system be out of service. Second, with the development of storage technologies, flash-based storage devices, especially solid-state drives (SSDs), have been used to equip large-scale storage systems. Though SSD devices can provide much higher I/O performance, they are also much more expensive than traditional
hard disk drives and must be utilized in a cost-effective way. Moreover, in some scenarios, the intensive write workloads issued to the large-scale storage systems could quickly wear out those SSD devices which also substantially degrades the system reliability. This talk takes a pragmatic view to discuss these critical yet challenging issues and introduces how to improve the reliability and efficiency of large-scale storage systems through optimal resource provisioning and data placement.

January 22, 2016 - Mauricio Gutierrez: Efficiently simulable approximations to realistic incoherent and coherent errors
and their applicability in threshold estimation

Abstract: Classical simulations of noisy stabilizer circuits are often used to estimate the threshold of a quantum error-correcting code (QECC). It is common to model the noise as a depolarizing channel by inserting Pauli gates randomly throughout the circuit [1]. However, it is not clear how sensitive a code's threshold is to the noise model, and whether or not a depolarizing channel is a good approximation for realistic non-stabilizer errors. Within the stabilizer formalism, it has been shown that for a single qubit more accurate approximations can be obtained by including in the noise model Clifford operators and Pauli operators conditional on measurement [2,3]. We now examine the feasibility of employing these error approximations at the single-qubit level to obtain better estimates of a QECC's threshold. We calculate the level-1 pseudo-threshold for the Steane [[7,1,3]] code for several incoherent [4] and coherent error models. At the logical level, the Pauli twirled channel (PTC) provides an extremely accurate approximation for incoherent channels. However, for coherent channels, the PTC severely underestimates the magnitude of the error [5] and results in optimistically high pseudo-threshold values. By computing the effective 1-qubit process matrix for the whole error-correcting circuit at low error rates, it becomes clear that this behavior is due to the stronger persistence of off-diagonal entries in the coherent channels, which the PTC cannot match. Therefore, if the main source of error in the quantum system is coherent, reliable stabilizer simulations should employ expanded Clifford channels.

[1] A.M. Steane, Phys. Rev. A 68, 042322 (2003).
[2] M. Gutierrez, L. Svec, A. Vargo, and K. R. Brown, Phys. Rev. A. 87, 030302(R) (2013).
[3] E. Magesan, D. Puzzuoli, C. E. Granade, D. G. Cory, Phys. Rev. A 87, 012324 (2013).
[4] M. Gutierrez and K. R. Brown, Phys. Rev. A 91, 022335 (2015).
[5] D. Puzzuoli, C. Granade, H. Haas, B. Criger, E. Magesan, and D. G. Cory, Phys. Rev. A 89, 022306 (2014).

Biography: Mauricio Gutierrez received his Bachelor's degree in Chemistry from the University of Costa Rica in 2009. In 2010, he enrolled in the graduate program in Physical Chemistry in Georgia Institute of Technology. There he became interested in the potential of quantum computers to solve important problems in the physical sciences. He joined the group of Ken Brown and decided to focus on quantum error correction. In particular, he studied the validity of the widespread assumption in the simulations of quantum error-correcting circuits that the exact nature of the error does not really matter and that the Pauli channel always represents a good model. He obtained his Ph.D. last month. Apart from quantum mechanics, Mauricio is passionate about programming, sustainable living, science in developing countries, and soccer.

January 20, 2016 - Larry Brown: GPU Computing Update and Roadmap

Abstract: NVIDIA is the inventor of the GPU, a highly parallel computer processor originally designed for graphics processing. Today, GPUs are used for general purpose computation in applications ranging from scientific applications, to deep learning and data analytics. This talk will discuss the end of Moore's law and Dennard scaling, basic GPU architecture and energy efficiency, as well as key applications such as Deep Neural Networks and Graph Analytics. A near term NVIDIA GPU product roadmap will be provided as well.

Biography: Larry is a Sr. Solution Architect with NVIDIA, where he helps customers design and deploy GPU accelerated workflows in high performance computing and data analytics. He has a Ph.D. from the Johns Hopkins University in the area of Vision Science, and a graduate certificate in Software Engineering from the University of Colorado. Larry has over 15 years of experience designing, implementing and supporting a variety of advanced software and hardware systems for defense and national security applications. He has designed electro-optical systems for head-mounted displays and training simulators, developed GIS applications for multi-touch displays, and adapted computer vision code in UGVs for the GPU. Currently Larry enjoys learning about data analytics and machine learning. Larry has spent much of his career working for technology start-up companies, but was most recently with Booz Allen Hamilton before joining NVIDIA.

January 13, 2016 - Travis Humble: CSMD Brown Bag Seminar: Quantum Computing Systems and Software

Abstract: Quantum computing promises new opportunities for solving hard computational problems, but harnessing this novelty will require breakthrough concepts in the design, operation, and application of computing systems. In this talk, we define some of the challenges facing the development of quantum computing systems as well as the software-base approaches that can be used to overcome these challenges. Following a brief overview of the state of the art, we present recent advances in the modeling and simulation of quantum computing systems, the development of architectures for hybrid high-performance computing systems, and the realization software stacks for quantum networking. This leads to a discussion of the role that conventional computing plays in the quantum paradigm and how some of the current challenges for exascale computing overlap with those facing quantum computing.

Travis Humble received his doctorate in theoretical chemistry from the University of Oregon before coming to ORNL in 2005. Dr. Humble is a member of Complex Systems Group and also an associate professor with the Bredesen Center for Interdisciplinary Research and Graduate Education at the University of Tennessee.

January 13, 2016 - Eric Lingerfelt: BEAM: A Computational Workflow System Enabling Scalable In Silico & Empirical Exploration of Materials Science Data in the DOE HPC Cloud

Abstract: We present an overview and demonstration of the Bellerophon Environment for Analysis of Materials (BEAM), which enables instrument scientists at ORNL's Institute for Functional Imaging of Materials (IFIM), Center for Nanophase Materials Sciences (CNMS), and Spallation Neutron Source (SNS) to leverage the integrated computational and analytical power of ORNL's Compute And Data Environment for Science (CADES) platform with HPC resources at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Energy Research Scientific Computing Center (NERSC) to perform scalable data analysis and execute computational simulations via a cross-platform Java application. At the core of this new system is a web and data server located at CADES that enables multiple, concurrent users to securely upload and manage data, execute material science workflows, and interactively engage analysis artifacts. BEAM's long-term data management services utilize CADES large-scale storage system and enable users to easily manipulate remote directories and uploaded data in their private data storage area as if they were browsing on their local workstation. The framework facilitates user workflow needs by enabling authenticated, "push-button" execution of material science workflows that deploy advanced data analysis algorithms and computational simulations on Titan at OLCF, Edison at NERSC, and a CADES compute cluster (together, a "DOE HPC Cloud"). By supplying a flexible environment for efficient delivery and execution of modern, scalable data analysis algorithms and scientific simulations in conjunction with robust data management capabilities and interactive 2D and 3D data views, BEAM attempts to accelerate scientific discovery by unifying in silico and empirical experiments for IFIM, CNMS, and SNS users.

Bio: Eric Lingerfelt is a technical staff member and software engineer in CSMD's Computer Science Research Group. He specializes in developing multi-tier, distributed software systems integrated with highly-interactive client-side applications that allow users to generate, access, visualize, manipulate, and share complex sets of data from anywhere in the world. For over a decade, he has lead the development of multiple software systems for the DOE and other customers in the fields of nuclear astrophysics, nanophase material science, Big Bang cosmology, core-collapse supernovae, isotope sales and distribution, environmental science, nuclear energy, theoretical nuclear science, and the oil and gas industry. Eric received his B.S. in Mathematics and Physics from East Tennessee State University in 1998 and his M.S. in Physics from the University of Tennessee in 2002.

January 11, 2016 - Rizwan A. Ashraf: A Framework to Analyze the Propagation of Transient Faults in HPC Applications

ABSTRACT: The computational and energy efficiency goals of modern high performance computing systems, has given rise to the development of technologies such as near-threshold voltage (NTV) or low-voltage operation, and use of commodity-off-the-shelf compute nodes with decreasing process technology. This has raised a need to assess the resilience of computing systems in the presence of increased soft errors, so as to provide feasibility of hardware or software level mitigation techniques while achieving the goal of ExaFLOPS of performance within a constrained power budget. Normally, soft faults occur at hardware level as the result of physical phenomena such as exposure to alpha particles, transient timing violations, or localized temperature variations. In this talk, implications of soft faults on distributed MPI applications are investigated mimicking their execution on unreliable hardware. In particular, emphasis is given on faults that escape hardware correction and detection due to the infeasibility of complete fault coverage for a large number of chips. The characteristics of how such faults propagate through the application's memory state are analyzed. A combination of compiler-level code transformation and instrumentation are employed for runtime monitoring to assess the speed and depth of application state corruption as a result of fault injection. Specifically, the understanding of where and how fast faults propagate could lead to more efficient implementation of application-driven error detection and recovery. Finally, fault propagation models are derived for each HPC application that can be used to estimate the number of corrupted memory locations at runtime.

December 21, 2015 - Madhu Hari: Using Software Engineering Methodologies to Port a Scientific Code to GPUs

ABSTRACT: Software engineering methodologies were applied to try to improve the portability, maintainability, and performance of a fusion energy code by systematically using tools to try to map computationally intensive parts of the code to GPUs using OpenACC. Our experiences show that while the approach is promising, tool support for Fortran is lacking and difficulties in working with the complex code in a supercomputing environment limit what can be achieved in a reasonable amount of time.

December 7, 2015 - Kathleen Hamilton: Percolation bounds for decoding thresholds with correlated erasures in quantum codes

ABSTRACT: Correlations between errors can dramatically affect decoding thresholds, in some cases eliminating the threshold altogether. The existence of a threshold in the case of correlated erasures is analyzed using percolation theory and graph theoretic concepts for quantum low-density parity-check (LDPC), and toric codes. The effects of positively correlated erasures can be modeled in terms of cluster errors, where qubits in clusters of various size can be marked all at once. In a code family with distance scaling as a power law of the code length, erasures can be always corrected below the percolation threshold on a qubit adjacency graph associated with the code. This correlated percolation transition is bounded by weighted (uncorrelated) percolation on a specially constructed cluster connectivity graph. Additionally, the spectral properties of the Hashimoto (non-backtracking) matrix are highlighted as they relate to several bounds for percolation on directed and undirected graphs in discussion of recent results.

November 5, 2015 - Christoph Beckermann: Modeling of Microstructure Evolution in Solidification Processes

ABSTRACT: Solidification is fundamental to the manufacture of all metallic materials and components. At the same time, the microstructures that form during solidification represent an interesting example of the spontaneous formation of a complex pattern. Modeling of solidification is challenging because it is characterized by an intricate interplay of multiple phenomena at several length and time scales. This seminar will provide an overview of recent progress made in numerically simulating solidification microstructure evolution. Examples include dendritic growth, columnar-to-equiaxed grain structure transitions, and concurrent growth and coarsening of mushy zones. Future challenges, particularly with respect to high performance computing and advanced manufacturing processes, are summarized.

November 3, 2015 - Ferrol Aderholdt: Virtual Machine Introspection-based Checkpoint/Restart for Survivable Clouds

ABSTRACT: Cloud computing is an extremely popular computing paradigm with academia and industry. This amount of popularity stems from the various properties of the cloud including ease of use, elasticity, reduced maintenance and energy costs for the consumer, and a pay-as-you-go model. As Enterprise computing migrates from on-site compute resources to cloud-based resources, an increased amount of adoption may occur. Adoption at this scale may present various challenges for cloud providers with respect to the ability to provide fault-free execution for users as well as mitigating attacks by malicious parties. In order to handle these difficulties, survivability may be applied to the cloud architecture such that increased adoption of cloud computing results in increased profits for both the consumer and provider. This talk discusses a virtual machine introspection-based checkpoint/restart mechanism for use in the cloud survivability framework (CSF), which is a user-level, component-based framework that applies the properties of survivability to current infrastructure-as-a-service (IaaS) cloud architectures.

October 29, 2015 - Mike Leuze: DNA2Face: Predicting Faces from a DNA Sample

ABSTRACT: The availability of large datasets linking human genomic sequences to observed traits provides the potential to correlate an individual's specific genomic code with susceptibility to disease, behavior, and physical appearance. The genome-physical appearance relationship is complex, with single genes having an impact on multiple aspects of appearance and individual features being influenced by multiple genes. In this project, we quantify the connections between human genomics and facial appearance using statistical techniques to determine principal components of facial morphology and computational genomics to find associations between these principal components and mutational variation. The ultimate goal is to develop the ability to estimate 3D facial appearance from DNA, a capability of significant value to the law enforcement, national security, and intelligence communities.

October 23, 2015 - Alvin R. Lebeck: Molecular-Scale Nanophotonics for Network-on-Chip and Probabilistic Computing Functional Units

ABSTRACT: This talk describes ongoing work exploring the use of emerging molecular scale devices for communication and computation. The first part of the talk presents Molecular-scale Network-on-Chip (mNoC). We leverage quantum dot LEDs, which provide electrical to optical signal modulation, and chromophores, which provide optical signal filtering for receivers. These devices replace the ring resonators and the external laser source used in contemporary nanophotonic NoCs enabling crossbar scaling up to radix 256. We'll also present mNoC power topologies, enabled by unique capabilities of mNoC technology, to reduce overall interconnect power consumption. A power topology corresponds to the logical connectivity provided by a given power mode. Broadcast is one power mode and it consumes the maximum power. Additional power modes consume less power but allow a source to communicate with a statically defined (potentially non-contiguous physically) subset of nodes. Overall power is reduced if the frequently communicating nodes use low power modes, while less frequently communicating nodes use higher power modes.

The second part of this talk describes our recent work on developing novel computational units to accelerate probabilistic algorithms. Recent advances in statistics and machine learning demonstrate the potential of probabilistic algorithms in achieving high quality solutions; however, there remains a mismatch between current deterministic hardware and these algorithms. To bridge this gap we are exploring devices that exploit Resonance Energy Transfer (RET) between chromophores to create efficient samplers for arbitrary probability distributions. We provide a brief overview of the device behavior, fabrication with DNA Self-assembly, proposed functional units and status of a macro-scale prototype.

Alvin R. Lebeck is a Professor of Computer Science and of Electrical and Computer Engineering at Duke University. Lebeck's research interests include architectures for emerging nanotechnologies, high performance microarchitectures, hardware and software techniques for improved memory hierarchy performance, multiprocessor systems, and energy efficient computing. In the field of emerging nanotechnologies he has done extensive work exploring the architectural implications of DNA self-assembly as a fabrication method for future systems. In the area of memory systems, Lebeck led efforts in improving cache hierarchy performance, tolerating memory latency, and improving main memory power management.

October 22, 2015 - Jian Huang: Interactive Selection of Multivariate Features in Large Spatiotemporal Data

ABSTRACT: Selecting meaningful features is central in the analysis of scientific data. Today's multivariate scientific datasets are often large and complex making it difficult to define general features of interest significant to scientific applications. To address this problem, we propose three general, spatiotemporal metrics to quantify the significant properties of data features - concentration, continuity and co-occurrence, named collectively as CO3. We implemented an interactive visualization system to investigate complex multivariate time-varying data from satellite remote sensing with great spatial resolutions, as well as from real-time continental-scale power grid monitoring with great temporal resolutions. The system integrates CO3 metrics with an elegant multi-space user interaction tool to provide various forms of quantitative user feedback. Through these, the system supports an iterative user-driven analysis process. Our findings demonstrate that the CO3 metrics are useful for simplifying the problem space and revealing potential unknown possibilities of scientific discoveries by assisting users to effectively select significant features and groups of features for visualization and analysis. Users can then comprehend the problem better and design future studies using newly discovered scientific hypotheses.

Bio: Jian Huang is a professor in Department of Electrical Engineering and Computer Science at the University of Tennessee, Knoxville. His research expertise includes large data visualization, multivariate data visualization and time-varying data visualization, as well as systems oriented areas of visualization such as parallel, distributed, remote and collaborative visualization. His research has been funded by DOE, NSF, NASA, Department of Interior, Intel, and UT-Battelle.

October 21, 2015 - Tonglin Li: Distributed NoSQL Storage for Extreme-Scale System Services in Clouds and Supercomputers

ABSTRACT: As supercomputers gain more parallelism at exponential rates, the storage infrastructure performance is increasing at a significantly lower rate due to relatively centralized system services and management. This implies that the data management and data flow between the storage and compute resources is becoming the new bottleneck for large-scale applications. Similarly, cloud based distributed systems introduce other challenges stemming from the dynamic nature of cloud applications. This talk discusses several challenges on storage systems at extreme scales for supercomputers and clouds and addresses them by designing and implementing a zero-hop distributed NoSQL store system (ZHT), which has been tuned for the requirements of high-end computing systems. ZHT aims to be a building block for scalable distributed system services. The goals of ZHT are delivering high availability, good fault tolerance, light-weight design, persistence, dynamic joins and leaves, high throughput, and low latencies, at extreme scales (millions of nodes). We have evaluated ZHT's performance under a variety of systems, ranging from a Linux cluster with 64-nodes, an Amazon EC2 virtual cluster up to 96-nodes, to an IBM Blue Gene/P supercomputer with 8K-nodes. This work also presents several real systems that have adopted ZHT as well as other NoSQL systems, namely ZHT/Q, FusionFS, IStore, MATRIX, Slurm++, Fabriq, Graph/Z, FREIDA-State, and WaggleDB, all of these real systems have been significantly simplified due to NoSQL storage systems, and have been shown to outperform other leading systems by orders of magnitude in some cases. Through our work, we have shown how NoSQL storage systems can help on both performance and scalability at large scales in such a variety of environments.

Bio: Tonglin Li is a 6th year Ph.D. candidate of the Department of Computer Science at Illinois Institute of Technology, Chicago. He'll receive his PhD degree in December 2015. He is a member of the Data-Intensive Distributed Systems Laboratory (DataSys) at IIT, and has been advised by Dr. Ioan Raicu as his research advisor. His research interests include distributed systems, storage systems, cloud computing, high performance computing, and big data. His publications include 3 journal papers, 8 conference paper, and 4 extended abstracts, in leading venues such as IPDPS, TCC, CCPE, and BigData.

October 20, 2015 - Todd Gamblin: Build and Test Automation at Livermore Computing

ABSTRACT: "Build and test servers like Jenkins CI and Atlassian Bamboo are commonplace in industry, but they are not widely available for users at large HPC centers. These tools integrate tightly with bug trackers and source control management (SCM) systems, and they allow facilities and code teams to automate development, testing, and deployment workflows.
The need for automated testing is particularly acute on unique, bleeding-edge systems like LLNL's Sequoia and ORNL's Titan. However, security issues make it difficult for centers to deploy these tools for all of their users."

"In this talk, I will give an overview of build and test efforts underway in LLNL's Livermore Computing (LC) Division. LC has recently deployed Atlassian Bamboo for end-users on two of our networks. Bamboo allows teams to share a central dashboard on the LC website, and to set permissions on their build configurations through the UI. Our solution allows users to run build agents securely, under their own identity, on production HPC systems. To automate the build process of large HPC applications, LLNL has also developed Spack, a flexible package management tool that allows users to explore the combinatorial build space of HPC packages. LLNL is using Spack with Bamboo to test tools and application codes with the many different compilers, software versions, and configurations that our users demand."

October 15, 2015 - Ian Foster: Accelerating Discovery Via Science Services

ABSTRACT: We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In big-science projects in high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to develop suites of science services to which researchers can dispatch mundane but time-consuming tasks, and thus to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of large-scale outsourcing and automation for science, and suggest opportunities and challenges for today's researchers. I use examples from Globus, Swift, and other projects to demonstrate what can be achieved.

Ian Foster is Director of the Computation Institute, a joint institute of the University of Chicago and Argonne National Laboratory. He is also an Argonne Senior Scientist and Distinguished Fellow and the Arthur Holly Compton Distinguished Service Professor of Computer Science. Methods and software developed under his leadership underpin many large national and international cyberinfrastructures. Ian's research interests include distributed, parallel, and data-intensive computing technology, as well as innovative applications of computing technologies to scientific problems.

October 13, 2015 - Edmond Chow: Very Fine-Grained Parallelization of Sparse Linear Algebra Computations

ABSTRACT: Massive concurrency is required in scientific and engineering algorithms in order to run efficiently on future computer architectures. High-end compute nodes already have hundreds to thousands of accelerator cores and core counts are anticipated to further increase. In this talk, we describe some new approaches for certain sparse linear algebra computations, particularly incomplete factorizations and sparse triangular preconditioner solves, that have much more concurrency than existing approaches. The main idea is to transform a problem into one that can be solved iteratively. By using asynchronous iterative methods, the coupling that must exist between processing units is obeyed, but can have much lower overhead than in the synchronous case.

October 9, 2015 - Vivek Seshadri: Can DRAM do more than just store data?

ABSTRACT: In today's systems, DRAM is used only as a storage device. Off-chip DRAM interfaces allow the memory controller to read and write data. As a result, any operation must first read the required data from DRAM and store the results back into DRAM. In this line of work, we observe that this model is very inefficient for certain key primitives in modern systems. And we ask the question, "Can DRAM do more than just store data?"

In response, we propose three techniques that exploit the DRAM architecture to significantly improve the efficiency of three important operations. First, we propose RowClone, a mechanism to perform bulk copy and initialization (specifically zeroing) operations completely within DRAM. RowClone improves the performance and energy efficiency of these operations by an order of magnitude. Second, we propose Gather-Scatter DRAM (GS-DRAM), a mechanism to improve the efficiency of non-unit strided access patterns. GS-DRAM achieves near ideal memory bandwidth and cache utilization for power-of-2 strided access patterns. Finally, we propose a new substrate that exploits existing DRAM operation to perform bulk bitwise operations completely within DRAM. Our mechanism enables an order-of-magnitude improvement in the throughput of bitwise operations.

In this talk, I will provide a brief tutorial of DRAM operation. I will then describe these three mechanisms in detail.

BIOGRAPHY: Vivek Seshadri is a Ph.D. student at the Computer Science Department at Carnegie Mellon University. He is advised by Prof. Todd Mowry and Prof. Onur Mutlu. His research interests are primarily in the field of computer systems, with specific focus on designing efficient memory systems.

October 1, 2015 - Alexander M. Feldt: Thinking About Vulnerability: Climate Change, Human Rights, and Moral Thresholds

ABSTRACT: Within the scientific community, and occasionally within the media, a lot of attention is placed on specific data points about climate change - 450ppm, 3oC warming, 10ft of sea-level rise. Moreover, these are typically presented as significant because they relate to harm we ought to avoid. Essentially, they serve as signifiers of when something bad will happen. However, to identify these various data points or phenomenon as markers of harm, one has already made some corresponding moral judgment about which thresholds are the ones we ought to care about. For example, if I don't think that people have any particular claim to anything beyond mere survival and data shows me people will be able to survive with 10ft of sea-level rise, even if it results in lots of environmental refugees, then I won't and shouldn't care about 10ft as an important data point. In this talk, I examine how engaging climate change from a human rights perspective can provide key resources for understanding what these data points mean as a moral threshold and defending why certain thresholds matter. I will offer an account linking human rights and the environment that utilizes the Capabilities Approach, which is at the core of much of the human development literature, to highlight to broad array of moral harms that can be caused by climate change. This can then be coupled with climate vulnerability modeling in a way that clearly articulates why certain thresholds matter, by linking the impacts of certain scenarios of climate change to human rights violations. By bringing a clear moral framework into climate modeling, we are better able to identify why we do and should care about the many important thresholds offered by the scientific community.

September 30, 2015 - Markus Eisenbach: LSMS & WL-LSMS: Codes for First Principles Calculation of the Ground State and Statistical Physics of Materials

ABSTRACT: The Locally Self-consistent Multiple Scattering (LSMS) code solves the first principles Density Functional theory Kohn-Sham equation for a wide range of materials with a special focus on metals, alloys and metallic nano-structures. It has traditionally exhibited near perfect scalability on massively parallel high performance computer architectures. We present our efforts to exploit GPUs to accelerate the LSMS code to enable first principles calculations of O(100,000) atoms and statistical physics sampling of finite temperature properties. Using the Cray XK7 system Titan at the Oak Ridge Leadership Computing Facility we achieve a sustained performance of 14.5PFlop/s and a speedup of 8.6 compared to the CPU only code.

September 17, 2015 - Jay Jay Billings: Integrated Modeling and Simulation with Eclipse ICE and its applicability to Neutron Science

ABSTRACT: Simulating the physical world is difficult from any perspective, although computational scientists usually focus on raw compute performance. Many tools exist for doing many different types of simulations and many more tools exist for generating input, post-processing results or managing data. Users are challenged to figure out how to use their new favorite code and extract knowledge from its results while developers are charged with making "One Simulator to Rule them All" that can be coupled to any and every other code and possibly extended in arbitrary ways. Both scenarios lead to significant challenges that stifle productivity and limit scientific innovation. Those challenges are not insurmountable and can be addressed by developing novel platforms to manage modeling and simulation just like real experiments.

This talk presents ORNL's modeling and simulation platform, the Eclipse Integrated Computational Environment (ICE), that was built to tackle these challenges. Eclipse ICE integrates a large collection of tools for users for input generation, job launch, visualization and data management. It also provides tools for developers in C/C++, Fortran, Java, Python and other languages to develop their software as well as a rich API for extending the platform to provide graphical plugins for their users. This talk will also demonstrate Eclipse ICE's support for several projects related to neutron science including Sassena for neutron scattering and a new simulator for neutron reflectometry. It will present ICE's visualization services that support 2D plotting, 3D geometry editing and fully interactive visualization with VisIt and Paraview. It will show how ICE can be controlled via Python scripts and its integration with other Eclipse-based projects. Finally, thoughts on future directions for the platform and its continued support for neutron science will be presented.

USB sticks with binaries and sample data will be available to attendees who want to follow along in the demonstration.

September 14, 2015 - James Elliott: Soft Errors in Linear Solvers: Fighting an Invisible Foe

ABSTRACT: This work presents a novel approach to HPC resilience that couples numerical analysis and analytic modeling. Our work will present models for soft errors in floating-point operations, and then extend these models to reveal the expected error should floating-point data experience a soft error. We then consider how to develop a resilient linear solver. We present a general approach for enforcing bounded error, and show experimentally that this technique can be very effective. Next, we consider a subset of soft errors that are undetectable given current detection approaches and cause high overhead, i.e., errors that will look correct with respect to a norm. We develop a numerical soft error injection technique that generates such errors, and then we evaluate algorithmic options for coping with such errors in the FT-GMRES (nested solvers) selective reliability framework. Our prototype is implemented using the Trilinos library, and all tests are evaluated in parallel with a state-of-the-art preconditioner that ensures that failure-free problem solves are very efficient. Our pessimistic error injection coupled with efficient solvers ensures that any overhead introduced by fault tolerance is noticeable. Using this approach, we then reason about algorithmic fault tolerance techniques inside iterative linear solvers using both analytic modeling and experimentation. We show our approach has a low "always-on" cost, while providing strong coverage for soft errors.

James Elliott is a candidate for a postdoctoral position in the Computer Science Research Group. He graduated from Louisiana Tech University with a B.S. in Computer Science and a M.S. in Mathematics and Statistics. James has been involved with the fields of HPC and resilience since 2005, where he studied how to virtualize a cluster using the Xen hypervisor. In 2007, he integrated various benchmarks into the OSCAR cluster management suite as part of a Google Summer of Code project. He worked directly with the Louisiana Optical Network Initiative as a graduate computational science fellow in 2009, and then moved on to pursue a Ph.D. in Computer Science at North Carolina State. James has studied alternatives to checkpoint/restart, and recently soft error resilience of numerical methods. A strong component of Mr. Elliott's work is the use of analytic modeling, and one day he hopes to demystify the "monster in the closet," that is soft error resilience. Mr. Elliott has also taught at the middle and high school level as part of the NSF GK-12 Teaching fellowship, and has worked at three national labs in various student-oriented programs.

September 9, 2015 - Dr. Mike Guidry: On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation

ABSTRACT: This talk reports on an interdisciplinary effort between ORNL and the Innovative Computing Laboratory and the Departments of Physics and Astronomy at UT, to provide new, highly-efficient solvers for realistic simulation of scientific problems. Various scientific applications require solvers that work on many small-size problems that are independent of each other. At the same time, high-end hardware is evolving rapidly and becoming even more throughput-oriented, so there is an increasing need for an energy-efficient, high-performance approach for these small problems that we call batched computation. The many applications that need this functionality could especially benefit from the use of GPUs, which currently are four to five times more energy efficient than multi-core CPUs on important scientific workloads. This talk describes the design, autotuning and optimization of batched GPU methods to accelerate large kinetic network simulations that use novel fast explicit integration algo

Taking as a generic test case a Type 1a supernova explosion with an extremely stiff thermonuclear network having 150 isotopic species and 1604 reactions that is assumed coupled to hydrodynamics using operator splitting, we demonstrate the capability to solve 250-500 realistic kinetic networks in parallel in the same time that the standard implicit methods used in calculations to date can solve a single such network on a CPU. This orders-of-magnitude decrease in compute time for solving systems of realistic kinetic networks implies that important, coupled, multiphysics problems in various scientific and technical fields that were intractable previously, or could be simulated only with highly schematic kinetic networks, are now computationally feasible.

August 27, 2015 - Dr. Bruno Turcksin: Parallelization and Adaptive Mesh Refinement

ABSTRACT: Parallelization and adaptive mesh refinement (AMR) are two techniques that can be exploited to speedup computation and to solve problems that would otherwise be inaccessible due to large memory requirements. In the case of parallelization, the speedup is obtained by partitioning the work between more processors while larger problems can be solved by having access to more memory. Meanwhile, in the case of adaptive mesh refinement, the mesh, and occasionally the polynomial order of the finite elements, is adapted to the problem to reduce the number of unknowns needed to achieve a given accuracy. This results in a smaller system to solve and a diminution of the memory required to solve a given problem. Here, the complementaries and the difficulties of applying these two techniques simultaneously will be illustrated through examples from neutron transport using AMR with MPI and hp-FEM for Stokes problem with multithreading.

BIOGRAPHICAL INFORMATION: Dr. Bruno Turcksin earned a Ph.D. in Nuclear Engineering from Texas A&M University in 2012. He is now a visiting assistant professor in the department of Mathematics at Texas A&M, working on the deal.II finite element library. His primary areas of expertise are numerical methods for neutron and electron transport, adaptive mesh refinement, and high performance computing.

August 12, 2015 - Alex McCaskey: Code Integration Between the BISON Fuel Performance and PROTEUS Neutronics Applications

ABSTRACT: This talk will present new code coupling strategies for the integration of components from Idaho National Laboratory's Multi-physics Object-Oriented Simulation Environment (MOOSE) and Argonne National Laboratory's SHARP Nuclear Reactor Framework. These frameworks take completely different approaches to the modeling and simulation of advanced nuclear reactor technologies, with MOOSE providing tools for the top-down development of coupled physics codes, and SHARP enabling the integration of existing legacy physics codes from the bottom-up. These differing philosophies have so far prevented the efficient integration of existing pieces from the two frameworks. The work presented here will detail a new way to enable this integration by building upon the existing features of both frameworks, as well the introduction of the extensible DataTransferKit for two-way solution transfer. This new methodology for code coupling efficiently enables the integration of codes with different languages, mesh representations, and solve types. This talk will demonstrate this integration avenue for the specific case of code between the BISON (MOOSE) fuel performance and PROTEUS (SHARP) neutronics applications, in an effort to improve solution accuracy for fuel performance calculations.

August 11, 2015 - Frank Mueller: On the Implications of Large-Scale Manycores and NoCs for Exascale

ABSTRACT: Future compute nodes in HPC will have hundreds if not thousands of codes. To accommodate the data demand of each core, network-on-chip (NoC) interconnect architectures are changing from rings to meshes. This work creates a novel communication abstraction for a mesh NoC and assesses the viability of MPI, OpenMP and hybrid execution models on a single die with 64 cores and a 2D mesh. Results indicate the importance of reduction in flow control and absence of contention on the NoC. They further illustrate how to better utilize memory parallelism in a transparent manner for HPC and beyond.

BIOGRAPHY: Frank Mueller is a Professor in Computer Science and a member of multiple research centers at North Carolina State University. Previously, he held positions at Lawrence Livermore National Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from Florida State University in 1994. He has published papers in the areas of parallel and distributed systems, embedded and real-time systems and compilers. He is a member of ACM SIGPLAN, ACM SIGBED and a senior member of the ACM and IEEE Computer Societies as well as an ACM Distinguished Scientist. He is a recipient of an NSF Career Award, an IBM Faculty Award, a Google Research Award and two Fellowships from the Humboldt Foundation.

July 20, 2015 - John Feo: Tables, Graphs, and Problems

ABSTRACT: Data collection and analysis is rapidly changing the way scientific, national security, and business communities operate. They have emerged as a fourth paradigm of science with American economic competitiveness and national security depending increasingly on the insightful analysis of large data sets. While extreme scale analytics share many of the computing issues as extreme scale scientific simulations, the nature of the problems and data create important differences. The volume, velocity, variety, and veracity of analytic data set it apart from scientific data. Moreover, the data does not partition neatly along physical boundaries, and algorithms do not map efficiently to bulk synchronous processes with nearest neighbor communication. This is true for both traditional table driven machine learning applications as well as emerging graph methods. While natural partitions can be found, irregular, inter-partition connections and extreme load imbalance limit scalability to small number of nodes for runtime systems that assign groups of data to single locals. Without scaling to large number of nodes, in-memory solutions based on such runtime systems are no more attractive than file-base solutions.

While at PNNL, I architected GEMS --- a multithreaded, semantic graph engine. The framework had three components: 1) a SPARQL front end to transform SPARQL to data parallel C code; 2) a semantic graph engine with scalable multithreaded algorithms for query processing; and 3) a custom multithreaded runtime layer for scalable performance on conventional cluster systems. Our objectives were twofold: 1) to scale system size as data sizes increase, and 2) to maintain query throughput as system size grows.

In this talk, I will summarize the data challenges facing scientists, intelligence analysts, and business leaders. I will discuss table and graph analytic methods and the problems introduced by the unbalanced distribution of real world data. I will describe GEMS in detail focusing on the graph engine and runtime layer, and present some performance results.

Dr. Feo received his Ph.D. in Computer Science from The University of Texas at Austin. He began his career at Lawrence Livermore National Laboratory where he managed the Computer Science Group and was the principal investigator of the Sisal Language Project. Dr. Feo then joined Tera Computer Company (now Cray Inc) where he was a principal engineer and product manager for the first two generations of the Cray's multithreaded architecture. After a short 2 year "sabbatical" at Microsoft where he led a software group developing a next-generation virtual reality platform, he joined PNNL as the Director of the Center for Adaptive Supercomputer Software and Principal Investigator of a large DOD project in graph analytics. Mostly recently, Dr. Feo was VP of Engineering at Context Relevant.

Dr. Feo's research interests are parallel programming, graph algorithms, multithreaded architectures, functional languages, and performance studies. He has published extensively in these fields. He has held academic positions at UC Davis and is an adjunct faculty at Washington State University.

July 20, 2015 - Fuli Yu: The excitement and challenge of genomic data analysis in the era of precision medicine

ABSTRACT: The emergence of multiple high-throughput data-rich technologies capable of characterizing genotypes and phenotypes, ranging from the population to cellular levels, has presented a paradigm shift in biomedical research. The bottleneck in scientific productivity has shifted from data production to integrative data analysis and interpretation. Integrating large-scale and high-dimensional molecular, physiological, and phenotypical data sets (including transcriptome, epigenome, microbiome, metabalome, proteome, imaging data and medical records) that are collected in a longitudinal manner across multiple studies holds great promise for identifying causal pathways from health to disease. These studies can reveal fundamental mechanistic insights as well as provide personalized approaches for disease prevention and treatment. The overarching challenge now facing biomedical researchers is how to utilize computational approaches to integrate these large-scale, high-dimensional data sets and to consequently build new knowledge and hypothesis.

We have developed an ensemble pipeline - goSNAP - that integrates multiple variant callers and heterogeneous computational infrastructures (cluster, cloud and supercomputer facilities) to optimize the performance both computationally and scientifically. By exploiting a hybrid paradigm of combining heterogeneous computational infrastructures, we effectively balanced the scalability and cost model. Local cluster was used for routine background steps such as alignment and recalibration, with data being aggregated over a long period of time. We used both the cloud (DNAnexus) and a supercomputer (Oak Ridge National Laboratory) to substantially ease the CPU and IO intensive steps when highly parallelized processing is desired to have a reasonable timeframe. Our deployment in CHARGE has shown that we can reduce the time from >6 months to just a few weeks.

July 1, 2015 - Greg Watson: Software Engineering for Science: Beyond the Eclipse Parallel Tools Platform

ABSTRACT: The Eclipse Parallel Tools Platform (PTP) project was started over 10 years ago with the goal of bringing best practices in software engineering to scientific computing. The results of the project have been mixed; we have seen adoption of Eclipse in many labs and academic institutions, and the PTP development environment has been downloaded over 1M times since records started being kept in 2012. However, we are still not seeing general use across the scientific computing community, and many negative perceptions of Eclipse still persist. In spite of the fact that a number of groups have their own Eclipse-based tools, we also haven't seen a high level of integration that was one of the original objectives of the project. Although software engineering practices have improved to some degree, there is still much room for improvement, particularly as the next generation of highly complex computing systems becomes available. This talk will discuss some key observations on the uptake of advanced development environments by the scientific computing community, and consider the factors that have influenced the adoption of PTP in particular. The presentation will then examine some areas that we believe would be beneficial for improving software engineering practices, as well as looking at some exciting possibilities for future research.

June 30, 2015 - Torsten Hoefler: How fast will your application run at <next>-scale? Static and dynamic techniques for application performance modeling

ABSTRACT: Many parallel applications suffer from latent performance limitations that may prevent them from utilizing resources efficiently when scaling to larger parallelism. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being made---a point where remediation can be difficult. However, creating analytical performance models that would allow such issues to be pinpointed earlier is so laborious that application developers attempt it at most for a few selected kernels, running the risk of missing harmful bottlenecks. We discuss dynamic techniques to generate performance models for program scalability to identify scaling bugs early and automatically. This automation enables a new set of parallel software development techniques. We demonstrate the practicality of this method with various real-world applications but also point out limitations of the dynamic approach. We then discuss a static analysis that establishes close provable bounds for the number of loop iterations and the scalability of parallel programs. While this analysis captures more loops then existing techniques based on the Polyhedral model, no analysis can count all loops statically. We conclude by briefly discussing how to combine these two approaches into an integrated framework for scalability and performance analysis.

BIOGRAPHY: Torsten is an Assistant Professor of Computer Science at ETH Zürich, Switzerland. Before joining ETH, he led the performance modeling and simulation efforts of parallel petascale applications for the NSF-funded Blue Waters project at NCSA/UIUC. He is also a key member of the Message Passing Interface (MPI) Forum where he chairs the "Collective Operations and Topologies" working group. Torsten won best paper awards at the ACM/IEEE Supercomputing Conference SC10, SC13, SC14, EuroMPI 2013, IPDPS 2015, and other conferences. He published numerous peer-reviewed scientific conference and journal articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. His research interests revolve around the central topic of "Performance-centric Software Development" and include scalable networks, parallel programming techniques, and performance modeling. Additional information about Torsten can be found on his homepage at

June 26, 2015 - Edwin Garcia: Progress Towards a Microstructurally Resolved Porous Electrode Theory for Rechargeable Batteries

ABSTRACT: In high energy density, low porosity, lithium-ion battery electrodes, the underlying microstructural characteristics control the macroscopic charge capacity, average lithium-ion transport, and macroscopic resistivity of the cell, particularly at high electronic current densities and power densities. In this presentation, we report on progress towards the development of a combined numerical+analytical framework to describe the effect of particle morphologies and its processing-induced spatial distribution on the macroscopic and position-dependent performance. Here, by spatially resolving the electrochemical fields, the effect of particle size polydispersity on the galvanostatic behavior is analyzed. We detail such effects in structures of controlled electrode compaction and polydispersity on the macroscopic effective transport properties and discuss its impact on the macroscopic galvanostatic response for existing and emerging energy storage devices. The framework presented herein enables to establish relations that combine the tortuosity and reactivity constitutive properties of the individual components. Macroscopic tortuosity-porosity relations for mixtures of porous particle systems of widely different length scales and well-known individual tortuosity constitutive equations are combined into self-consistent macroscopic expressions, in agreement with recently reported empirical measures.

June 22, 2015 - Mohamed Wahib : Scalable and Automated GPU Kernel Transformations in Production Stencil Applications

We present a scalable method for exposing and exploiting hidden localities in production GPU stencil applications. Exploiting inter-kernel localities is essentially the following: find the best permutation of kernel fusions that would minimize redundant memory accesses. To achieve this, we first expose the hidden localities by analyzing inter-kernel data dependencies and order-of-execution. Next, we use a scalable search heuristic that relies on a lightweight performance model to identify the best candidate kernel fusions. Experiments with two real-world applications prove the effectiveness of manual kernel fusion. To make kernel fusion a practical choice, we further introduce an end-to-end method for automated transformation. A CUDA-to-CUDA transformation collectively replaces the user-written kernels by auto-generated kernels optimized for data reuse. Moreover, the automated method allows us to improve the search process by enabling kernel fission and thread block tuning. We demonstrate the practicality and effectiveness of the proposed end-to-end automated method. With minimum intervention from the user, we improved the performance of six applications with speedups ranging between 1.12x to 1.76x.

BIOGRAPHY: Mohamed Wahib is currently a postdoctoral researcher in the "HPC Programming Framework Research Team" at RIKEN Advanced Institute for Computational Science (RIKEN AICS). He joined RIKEN AICS in 2012 after years at Hokkaido University, Japan, where he received a Ph.D. in Computer Science in 2012. Prior to his graduate studies, he worked as a researcher at Texas Instruments (TI) R&D for four years.

June 12, 2015 - Saurabh Hukerikar : Introspective Resilience for Exascale High Performance Computing Systems

ABSTRACT: Future exascale High Performance Computing (HPC) systems will be constructed from VLSI devices that will be less reliable than those used today, and faults will become the norm, not the exception. Furthermore, the Mean Time to Failure (MTTF) of the system scales inversely to the number of components in the system and therefore faults and resultant system level failures will increase, as systems scale in terms of the number of processor cores and memory modules used. This will pose significant problems for system designers and programmers, who for half-a-century have enjoyed an execution model that assumed correct behavior by the underlying computing system. However, not every error detected needs to result in catastrophic failure. Many HPC applications are inherently fault resilient but lack convenient mechanisms to express their resilience features to the execution environments which are designed to be fault oblivious.

Dr. Hukerikar will present research conducted as part of his PhD dissertation which proposes an execution model based on the notion of introspection. A set of resilience oriented language extensions was developed, which facilitate the incorporation of fault resilience as intrinsic property of the scientific application codes. These are supported by a compiler infrastructure and a runtime system that reasons about the context and significance of faults to the outcome of the application execution. The compiler infrastructure was extended to demonstrate an application level methodology for fault detection and correction that is based on redundant multithreading (RMT). An introspective runtime framework was also developed that continuously observes and reflects upon the platform level fault indicators to assess the vulnerability of the system's resources. The introspective runtime system provides a unified execution environment that reasons about the implications of resource man
agement actions for the resilience and performance of the application processes. Results, which cover several high performance computing applications and different fault types and distributions, demonstrate that a resilience aware execution environment is important to solve the most demanding computational challenges on future extreme scale HPC systems.

*Saurabh Hukerikar is a candidate for a postdoctoral position with the Computer Science Research Group. He recently completed his PhD from the Ming Hsieh Department of Electrical Engineering at the University of Southern California. He works with the Computational Systems and Technology Division at USC's Information Sciences Institute. His graduate work seeks to address the challenge of resilience for extreme-scale high-performance computing (HPC) systems. He received a MS in Electrical Engineering in 2010 and a MS in Computer Science (with emphasis on High Performance Computing and Simulations) in 2012 both from the University of Southern California

June 5, 2015 - Vivek Sarkar : Runtime System Challenges for Extreme Scale Systems

ABSTRACT: It is widely recognized that radical changes are to be expected in future HPC systems to address the challenges of extreme-scale computing. Specifically, they will be built using homogeneous and heterogeneous many-core processors with 100's to 1000's of cores per chip, their performance will be driven by parallelism (billion-way parallelism for an exascale system), and constrained by energy and data movement. They will also be subject to frequent faults and failures. Unlike previous generations of hardware evolution, these Extreme Scale HPC systems will have a profound impact on future applications and their underlying software stack. The software challenges are further compounded by the addition of new application requirements that include, most notably, data-intensive computing and analytics.

The challenges across the entire software stack for Extreme Scale systems are driven by programmability, portability and performance requirements, and impose new requirements on programming models, languages, compilers, runtime systems, and system software. Focus is on the critical role played by Runtime Systems in enabling programmability in the upper layers of the software stack that interface with the programmer, and in enabling performance in the lower levels of the software stack that interface with the operating system and hardware.

Examples of key runtime primitives being developed to address these challenges will be drawn from experiences in the Habanero Extreme Scale Software Research project which targets a wide range of homogeneous and heterogeneous manycore processors, as well as from the Open Community Runtime (OCR) system being developed in the DOE X-Stack program. Background material for this talk will also be drawn from the DARPA Exascale Software Study report from the DOE ASCAC study on Synergistic Challenges in Data-Intensive Science and Exascale Computing. We would like to acknowledge the contributions of all participants in the Habanero project, the OCR project, and the DARPA and DOE studies.

BIOGRAPHY: Vivek Sarkar is Professor and Chair of Computer Science at Rice University. He conducts research in multiple aspects of parallel software including programming languages, program analysis, compiler optimizations and runtimes for parallel and high performance computer systems. He currently leads the Habanero Extreme Scale Software Research Laboratory at Rice University, and serves as Associate Director of the NSF Expeditions Center for Domain-Specific Computing. Prior to joining Rice in July 2007, Vivek was Senior Manager of Programming Technologies at IBM Research. His responsibilities at IBM included leading IBM's research efforts in programming model, tools, and productivity in the PERCS project during 2002- 2007 as part of the DARPA High Productivity Computing System program. His prior research projects include the X10 programming language, the Jikes Research Virtual Machine for the Java language, the ASTI optimizer used in IBM's XL Fortran product compilers, the PTRAN automatic parallelization system, and profile-directed partitioning and scheduling of Sisal programs. In 1997, he was on sabbatical as a visiting associate professor at MIT, where he was a founding member of the MIT Raw multicore project. Vivek became a member of the IBM Academy of Technology in 1995, the E.D. Butcher Chair in Engineering at Rice University in 2007, and was inducted as an ACM Fellow in 2008. He holds a B.Tech. degree from the Indian Institute of Technology, Kanpur, an M.S. degree from University of Wisconsin-Madison, and a Ph.D. from Stanford University. Vivek has been serving as a member of the US Department of Energy's Advanced Scientific Computing Advisory Committee (ASCAC) since 2009.

May 27, 2015 - Jeffrey K. Hollingsworth : Active Harmony: Making Autotuning Easy

ABSTRACT: Active Harmony is an auto-tuning framework for parallel programs. In this talk, I will describe how the system makes it easy (sometimes even automatic) to create programs that can be auto-tuned. I will present examples from a few applications and programming languages. I will also discuss recent work we have been doing to provide support for auto-tuning programs with multiple (potentially conflicting) objectives such as performance and power.

BIOGRAPHY: Jeffrey K. Hollingsworth is a Professor of the Computer Science Department at the University of Maryland, College Park. He also has an appointment in the University of Maryland Institute for Advanced Computer Studies and the Electrical and Computer Engineering Department. He received his PhD and MS degrees in computer sciences from the University of Wisconsin. His research is in the area of performance measurement, auto-tuning, and binary instrumentation. He is Editor in chief of the journal Parallel Computing, was general chair of the SC12 conference, and is Vice Chair of ACM SIGHPC.

May 19, 2015 - Mikolai Fajer : Effects of the SH2/SH3 Regulatory Domains on the Activation Transition of c-Src Kinases

ABSTRACT: The c-Src kinase is an important component in cellular signalling, and its activity is closely regulated by the SH2/SH3 domains. Using the swarms-of-trajectories string method, the transition from inactive to active conformations of the kinase domain are studied in the presence of the SH2/SH3 domains. The assembled, down-regulated SH2/SH3 conformation closely resembles the activation transition of the kinase-only domain. The re-assembled and up-regulated SH2/SH3 conformation pre-orients several side chains for their active state interactions, thus promoting the active state of the kinase.

BIOGRAPHY: Mikolai Fajer received his bachelor's degree in Physics and Chemistry from the University of Florida. He then went on to get his PhD working under Andy McCammon at the University of California, San Diego, working on enhanced sampling methods. Most recently he has been working as a postdoc for Benoit Roux at the University of Chicago, studying conformational transitions in biomolecular systems.

May 14, 2015 - Brent Gorda : Lustre Keeping Pace with Compute and Intel's Continued Commitment

ABSTRACT: Brent will discuss the topic of "Lustre Keeping Pace with Compute and Intel's Continued Commitment." What is Intel's role in making sure data can safely move in and out of High Performance compute at extreme scale and at the speed of your network interface? Why do both Scientific Simulation environments and increasingly Big Data Applications need advanced parallel file systems such as Intel's hardened Lustre? How are partners now driving Lustre Innovation, alongside the Lustre Community? What improvements are coming in Lustre for Small File Performance, HSM, Fault Tolerance, Snapshot and Security? To get to Exascale Computing, what needs to change in I/O?

BIOGRAPHY: Brent Gorda is the General Manager of the High Performance Data Division at Intel. Brent co-founded and led Whamcloud, a startup focused on the Lustre technology which was subsequently acquired by Intel. A longtime member of the HPC community, Brent was at the Lawrence Livermore National Laboratory and responsible for the BlueGene P/Q architectures as well as many of the large IB-based cluster architectures in use among the NNSA DOE laboratories. Brent is the founder of the Student Cluster Competition, a worldwide event that showcase the power of parallel/cluster computing in the hands of students.

April 30, 2015 - Dimitri Mavriplis : High Performance Computational Aerodynamics for Multidisciplinary Wind Energy and Aerospace Vehicle Analysis and Optimization

ABSTRACT: This talk will describe the development of a multi-solver, overlapping adaptive mesh CFD capability that scales well on current high performance computing hardware with applications in aerospace vehicle analysis and design and complete wind farm simulations. The multisolver paradigm makes use of a near- body unstructured mesh solver coupled with an adaptive Cartesian higher-order accurate off-body solver implemented within the SAMRAI framework. An overview of the multi-solver software structure will be given, after which a description of the solution techniques used for the unstructured mesh multigrid solver component will be presented in more detail. Subsequently, the incorporation of a discrete adjoint capability will be described for multidisciplinary time-dependent aero-structural problems, and results demonstrating the optimization of time-dependent helicopter rotors will be shown. The talk will conclude with prospects for advanced discretizations and solvers
as we move towards the exascale era.

BIOGRAPHICAL INFORMATION: Dimitri Mavriplis is currently the Max Castagne Professor in Mechanical Engineering at the University of Wyoming. He obtained his Bachelor and Master's degrees in Mechanical Engineering from McGill University and his PhD in Mechanical and Aerospace Engineering from Princeton University. After graduation, he spent over 15 years at ICASE/NASA Langley where we worked on the development of unstructured mesh discretizations and solvers. In 2003 he joined the University of Wyoming where he leads a research group that focuses on HPC solver technology, adjoint methods for optimization and error control and high-order discretizations with applications in multidisciplinary wind energy and aerospace vehicle analysis and design optimization.

April 16, 2015 - David Lecomber : Software Engineering for HPC - Experiences in Developing Software Tools for Rapidly Moving Targets

Code modernization is one of the hotter topics in HPC today - but modernization is about more than modern processors. I will consider how the modernization of software practices is making an impact in HPC - and some of the best practices we see out in the field amongst HPC developers. I will examine the challenges of software engineering to production and beyond from the perspective of engineering at Allinea, how we develop and test in a world of constant change, and the lessons learned along the way.

April 8, 2015 - Kirk W. Cameron : Why high-performance systems need a little bit of LUC

In 1936, Harvard University sociologist Robert Morton wrote a paper entitled "The unanticipated consequences of purposive social action", where he described how government policies often result in both positive and negative unintended consequences. The lesson from Morton's work was that unexpected consequences in complex social systems, at the time relegated to theology or chance, should be evaluated scientifically.

Independent groups typically design the components of HPC systems. Hard disks, processors, memories, and boards are eventually combined with BIOSs, file systems, operating systems, communication libraries, and applications. Today's components also adapt automatically to local conditions to improve efficiency. For example, processors and memories can vary their frequencies in response to demand. Disks can vary their rotation speeds. BIOSs and OSs can adapt their scheduling policies for different use cases.

Since the performance effects of local hardware and software management are largely unknown, these potentially valuable features are often disabled in high-performance environments. And unfortunately, while we assume that disabling these features will have positive consequences, Morton teaches us that relegating performance behavior to chance is just as likely to result in negative consequences. For example, there is mounting evidence that when processors are fixed at the highest frequency (i.e., disabling dynamic frequency scaling), performance can worsen.

In this presentation, I will revisit the conventional wisdom that "faster is always better" for processor speeds in high-performance environments. In essence, through exhaustive experimentation, we can demonstrate quantitatively that slowing down CPU frequency can speed up performance as much as 50% for some I/O intensive applications. For the first time, we have identified the root cause of slowdowns at higher frequencies. I will describe how the LUC runtime system Limits the Unintended Consequences of processor speed in high-performance I/O applications. Our work also motivates the need to reject chance as an explanation of performance and revisit first principals so we can design systems that truly offer the highest performance.

Kirk W. Cameron is Professor and Associate Department Head of Computer Science in the College of Engineering at Virginia Tech. The central theme of his research is to improve power and performance efficiency in high performance computing (HPC) systems and applications. More than half a million people in more than 160 countries have used his power management software. In addition to his research, his NSF-funded, 256-node SeeMore kinetic sculpture of Raspberry Pi's was featured at SIGGRAPH 2014 in Vancouver, B.C. and is scheduled for multiple exhibitions in Washington D.C. and New York in 2015.

March 31, 2015 - Keita Teranishi : Local Failure Local Recovery for large scale SPMD applications

As leadership class computing systems increase in complexity and component feature sizes continue to decrease, the ability of an application code to treat the system as a reliable digital machine diminishes. In fact, there is a growing concern in the high performance computing community that applications will have to explicitly manage resilience issues beyond the current practice of checkpoint/restart (C/R). In particular, the current system reaction to the loss of a single MPI process is to terminate all remaining processes and restart the application from the most recent checkpoint. This is suboptimal at scale because the recovery cost is not to the size of failures. We address this scaling issues using an emerging resilient computing model called Local Failure, Local Recovery (LFLR) that attempts to provide application developers with the ability to recover locally and continue application execution when a process is lost. In this talk, I will present our two ongoing efforts to enable scalable on-line application recovery, including the general-purpose recovery heavily leveraging MPI-ULFM (fault tolerate MPI prototype), and recovery of stencil-based code using Cray's uGNI.

BIOGRAPHICAL INFORMATION: Keita Teranishi is a principal staff member of Scalable Modeling and Analysis Systems at Sandia National Laboratories in California. Before joining Sandia, he was involved in several projects in dense and sparse matrix libraries development at Cray Inc. His broad research interest in HPC includes application resilience, programming models, automatic performance tuning and numerical linear algebra. He holds an MS degree from University of Tennessee, Knoxville and a Ph.D. degree from Pennsylvania State University.

March 30, 2015 - Sarah Osborn : Solutions Strategies for Stochastic Galerkin Discretizations of PDEs with Random Data

When using partial differential equations (PDEs) to model physical problems, the exact values of coefficients are often unknown. To obtain more realistic models, the coefficients are typically treated as random variables in an attempt to quantify uncertainty in the underlying problem. Stochastic Galerkin methods are used to obtain numerical solutions for these types of problems. These methods couple the stochastic and deterministic degrees-of-freedom and yield a large system of equations that must be solved. A challenge in this method is solving the large system accurately and efficiently. Typically the system is solved iteratively and reconditioning strategies dictate the performance of the iterative method. The goal of this work is to improve solver efficiency by investigating preconditioning techniques and solver implementation details. The model problem considered is the diffusion problem with uncertainties in the diffusion coefficient. An algebraic multigrid preconditioner based on smoothed aggregation is presented with emphasis on the formulation of the model problem where the uncertain component has a nonlinear structure. Special consideration is given to the solution and proposed preconditioning strategy for improving performance on emerging architectures. Numerical results will be presented that illustrate the performance of the proposed preconditioner and implementation changes.

March 30, 2015 - Emil Alexov : Revealing the molecular mechanism of Snyder-Robinson Syndrome and rescuing it with small molecule binding

The Snyder-Robinson Syndrome (SRS) (OMIM 300105) is a rare mental retardation disorder which is caused by missense mutations in the spermine sythase gene (SpmSyn). The SpmSyn encodes a protein, the spermine synthase (SMS) of 529 amino acids, which becomes dysfunctional in SRS patients due to specific missense mutations. Here we investigate, in silico and in vitro, the molecular effect of these amino acid substitutions causing SRS and demonstrate that almost always the mutations do not directly affect the functional properties of the SMS, but rather indirectly alter its wild type characteristics. A particular feature of SMS, which is shown to affect SMS functionality, is the formation of SMS homo-dimer. If the homo-dimer does not form, the activity of SMS is practically abolished. With this regard we identify several disease-causing mutations that affect homo-dimerization of SMS and carry in silico screening to identify small molecules which binding to the destabilized homo-dimer can restore wild type homo-dimer affinity. The investigation resulted in extensive list of plausible stabilizers, among which we selected and tested 51 compounds experimentally for their capability to increase SMS mutant enzymatic activity. In silico analysis of the experimentally identified stabilizers suggested five distinctive chemical scaffolds. The identified chemical scaffolds are drug-like and can serve as original starting points for development of lead molecules to further rescue the disease-causing effects of the Snyder-Robinson syndrome for which no efficient treatment exists up to now. Lab page URL:

BIOGRAPHICAL INFORMATION: Dr. Emil Alexov is a Professor in the Department of Physics and Astronomy at Clemson University. He received his Ph.D. in Radiophysics and Electronics and his M.S. in Plasma Physics from Sofia University. He is currently a member of the American Physical Society, the Biophysical Society and the Protein Society. Dr. Alexov has been active in the National Institutes of Health, the National Scientific Foundation, among many other professional scientific activities

March 9, 2015 - Mark Kim: GPU-enabled Particle Systems for Visualization

Particle systems have a rich history in scientific visualization because of their practicality and versatility. And although particles are a useful tool for visualization, one difficulty is particle advection on an arbitrary surface. One solution is to parameterize the surface, which can be difficult to construct and utilize. Another method is to use a distance field and reproject particles onto the surface, which is a iterative search. Unfortunately, this iterative search is not optimal on the GPU.

In this talk, I will discuss our research on particle advection on surfaces on the GPU. As GPUs have become more powerful and accessible for general purposes, new techniques are required to fully utilize that performance. I will begin my talk with a discussion about some of the problems with particle systems on the GPU. In particular, I will discuss issues adapting multimaterial mesh extraction to the GPU. To address these issues, a new surface representation was chosen: the closest point embedding. The closest point embedding is a simple grid-based representation for arbitrary surfaces. To demonstrate the effectiveness of the closest point embedding, I will present two visualization techniques sped up on the GPU with the closest point embedding. First, the closest point embedding is used to speed-up particle advection for multimaterial mesh extraction on the GPU. Second, unsteady flow visualization on arbitrary surfaces is simplified and sped up with the closest point embedding.

March 6, 2015 - Sungahn Ko: Aided decision-making through visual analytics systems for big data

As technologies have advanced, various types of data are produced in science and industry, and extracting actionable information for making effective decisions becomes increasingly difficult for analysts and decision makers. The main reasons causing such difficulty are two-fold; 1) the overwhelming amount of data prevents users from understand the data during exploration, and 2) the complexity of the multiple data characteristics (multivariate, spatial, temporal or/and networked) needs an integrated data presentation for finding any pattern, trend, or anomaly for decision-making. To overcome the analysts' information overload and enable effective visual presentation for efficient analysis and decision making, an interactive visual exploration and analysis environment are needed since traditional machine learning and big data analytics alone are insufficient. In this talk, I present visual analytics approaches for solving the big data problem and examples including spatiotemporal network data analysis, business intelligence, and steering of simulation pipelines.

February 3, 2015 - Sergiy Kalnaus: Predictive modeling for electrochemical energy storage

Electrochemical energy storage devices have gained popularity and market penetration as means for providing energy/power source for consumer electronics, hybrid and fully electric vehicles (EV), and grid storage. Lithium-ion secondary batteries represent the most promising and commercially viable segment, although lithium, lithium-air as well as intercalation systems based on other metals (sodium, aluminum) are being studied. Despite being adopted in many electrified powertrains (BMW ActiveE, Nissan Leaf, Ford C-max Energi, etc), Li-ion batteries are still suffering from high manufacturing cost, low cycle life and safety issues. Modeling and simulation is a great tool for quantifying the response that otherwise cannot be assessed experimentally and for designing the strategies for better management of such systems. This talk will discuss the modeling approaches and results of computational studies of performance and safety of Li-ion batteries. The newly released Virtual Integrated Battery Environment (VIBE) is an integral part of the Open Architecture Software Framework designed within the CAEBAT (Computer Aided Engineering for Batteries) project. Coupled simulations and physics models within VIBE will be discussed.

January 29, 2015 - Deepak Majeti: Portable Programming Models for Heterogeneous Platforms

Heterogeneous architectures have become mainstream today and are found in a range of system from mobile devices to supercomputers. However, these architectures with their diverse architectural features pose several programmability challenges including handling data-coherence, managing computation and data communication, and mapping of tasks and data distributions. Consequently, application programmers have to deal with new low-level programming languages that involves non-trivial learning and training. In my talk, I will present two programming models that tackle some of the aforementioned challenges. The first model is the "Concord" programming model which provides a widely used Intel Thread Building Blocks like interface and targets integrated CPU+GPU architectures with semi-coherent caches. This model also supports a wide set of C++ language features. The second model is "Heterogeneous Habanero C (H2C)", which is an implementation of the Habanero execution model for modern heterogeneous architectures. The novel features of H2C include high-level language constructs that support automatic data layout, task mapping and data distributions. I will conclude the talk with performance evaluations of Concord and H2C, and propose future extensions to these models.

Deepak is a 5th year graduate student at Rice University working with Prof. Vivek Sarkar. As part of his ongoing doctoral thesis, he is developing Heterogeneous Habanero-C (H2C). Deepak's areas of interest include programming models, compiler and runtime support for modern heterogeneous architectures. He was a major contributor to the Concord project as an intern at Intel Programming Systems Lab. He also worked on porting the Chapel programming language onto the HSA + XTQ architecture as an intern at AMD Research. Apart from research, Deepak loves to play sports which include soccer, badminton, squash and of course cricket.

January 6, 2015 - David M. Weiss: Industrial Strength Software Measurement

In an industrial environment where software development is a necessary part of product development, measuring the state of software development and the attributes of the software becomes a crucial issue. For a company to survive and to make progress against its competition, it must have answers to questions such as "What is my customers' perception of the quality of the software in my products?", "How long will it take me to complete a new product or a new release of an existing one?" "What are the major bottlenecks in software production?" "How effective is a new technique or tool when introduced into the software development process?" The fate of the company, and of individuals within the company, may depend on accurate answers to these questions, so one must not only know how to obtain and analyze data to answer them, but also estimate how good one's answers are. In a large scale industrial software development environment, software measurement must be meaningful, automatable, nonintrusive, and feasible. Sources of data are diffuse, nonuniform, and nonstandard. The data itself are difficult to collect and interpret, and hard to compare across projects and organizations. Nonetheless, other industries perform such measurements as a matter of course, and software development organizations should as well. In this talk I will discuss the challenges of deciding what questions to ask, how to answer them, and what the impact of answering them is. I will illustrate with examples drawn from real projects, and from an existing and ongoing project, that details the state of software production in a large company, focusing on change data and how to use it to answer some of the questions posed in the preceding.

December 19, 2014 - Soumi Manna: Evaluating the Performance of the Community Atmosphere Model at High Resolutions


The Community Atmosphere Model (CAM5) is one of the multiple component models in the Community Earth System Model (CESM). Recently, efforts have been focused on increasing the resolution of CAM5 to produce more accurate predictions. Additionally, new developments have enabled the use of mesh refinement in CAM5 through the High-Order Method Modeling Environment (HOMME) dynamical core. These meshes allow for regions with extremely high-resolution and produce a challenge to the current parallel domain decomposition algorithm.

In this project, we focused on analyzing the performance of HOMME on high and variable resolutions. We investigated the quality of domain decompositions produced by space-filling curve algorithms for refined and unrefined meshes. Additionally, we evaluated performance metrics of realistic simulations on these meshes using the automatic trace analysis tool Scalasca. By correlating performance bottlenecks with geometric mesh information, we identified sub-optimal properties of the domain decompositions and worked to address this behavior. Improving the quality of these decompositions will increase the scalability of simulations at these resolutions enhancing their scientific impact.

December 12, 2014 - Jay Jay Billings: Eclipse ICE: ORNL's Modeling and Simulation User Environment

In the past several years ORNL modeling and simulation projects have experienced an increased need for interactive, graphical user tools. The projects in question span advanced materials, batteries, nuclear fuels and reactors, nuclear fusion, quantum computing and many others. They all require four tasks that are fundamental to modeling and simulation: creating input files, launching and monitoring jobs locally and remotely, visualizing and analyzing results, and managing data. This talk will present the Eclipse Integrated Computational Environment (ICE), a general-purpose open source platform that provides integrated tools and utilities for creating rich user environments. It will cover both the robust, new infrastructure developed for modeling and simulation projects, such as new mesh editors and visualization tools, as well as the plugins for codes that are already supported by the platform and taking advantage of these features . The design philosophy of the project will also be presented as well as how the "n-body code problem" is solved by the platform. In addition to covering the services provided by the platform, this talk will also discuss ICE's place in the larger Eclipse ecosystem and how it became an Eclipse project. Finally, we will show how you can leverage it to accelerate your code deployment, use it to simplify your modeling and simulation project or get involved in the development.

Bio: Jay Jay Billings is a member of the research staff in the Computer Science Research group and leader of the ICE team.

December 12, 2014 - Andrew Ross: Large scale Foundation Nurtured Collaboration

Software and data are crucial to almost all organizations. Open Source Software and Open Data are a vital part of this. This presentation provides a glimpse of why an open approach to software and data results in far more than just free software and data, as measured in terms of freedoms and acquisition price. Collaboration across groups within large organizations and between organizations is hard. The Eclipse Foundation is the NFL of open collaborations. It provides governance structure, technology infrastructure, and many services to facilitate collaboration. This presentation will briefly examine this and how working groups hosted by the Eclipse Foundation are enabling collaboration for domains such as Scientific R&D, Internet of Things (IoT), Location aware technologies, and more. The results are important; such as:

From this presentation, audience members will get a brief taste of some of the collaboration opportunities, how to learn more, and how to get involved.

Bio: Andrew Ross is Director of Ecosystem Development at the Eclipse Foundation, a vendor neutral not-for-profit. He is responsible for Eclipse's collaborative working groups including the LocationTech and Science groups which collaboratively develop software for location-aware systems and scientific research respectively. Prior to the Eclipse Foundation, Andrew was Director of Engineering at Ingres where his team developed advanced spatial support features for the relational database and many applications. Before Ingres, Andrew developed highly available Telecom solutions based on open source technologies for Nortel.

December 10, 2014 - Beth Plale: The Research Data Alliance: Progress and Promise in Global Data Sharing

The Research Data Alliance is coming up on 1.5 years old along the road to realizing its vision of "researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society." RDA has grown tremendously in the last 1.5 years from a handful of committed individuals to an organization with 1600 members in 70 countries. As one who was part of the small group that got RDA off the ground and remains deeply engaged, I will introduce the Research Data Alliance, take stock of it's impressive accomplishments to date, and highlight what I see as the opportunities it faces in realizing the grand goal RDA states so succinctly in its vision.

November 14, 2014 - Taisuke Boku: Tightly Coupled Accelerators: A very low latency communication system on GPU cluster and parallel programming

Accelerating devices such as GPU, MIC or FPGA are one of the most powerful computing resources to provide high performance/energy and high performance/space ratio for wide area of large scale computational science. On the other hand, the complexity of programming combining various frameworks such as CUDA, OpenCL, OpenACC, OpenMP and MPI is growing and seriously degrades the programmability and productivity.

We have been developing XcalableMP (XMP) parallel programming language for distributed memory architecture for PC clusters to MPP, and enhancing its capability to include accelerating devices for heterogeneous parallel processing systems. XMP is a sort of PGAS language, and XMP-dev and XMP-ACC are the extension for accelerating devices. On the other hand, we are also developing a new technology for inter-node GPU direct communication named TCA (Tightly Coupled Accelerators) architecture network from special hardware to the applications covered by this concept. Our on-going project vertically integrate all these components toward the new generation of parallel accelerated computing.

In this talk, I will introduce our on-going project which vertically integrates all these components toward the new generation of parallel accelerated computing.

Prof. Taisuke Boku received Master and PhD degrees from Department of Electrical Engineering at Keio University. After his carrier as assistant professor in Department of Physics at Keio University, he joined to Center for Computational Sciences (former Center for Computational Physics) at University of Tsukuba where he is currently the deputy director, the HPC division leader and the system manager of supercomputing resources. He has been working there more than 20 years for HPC system architecture, system software, and performance evaluation on various scientific applications. In these years, he has been playing the central role of system development on CP-PACS (ranked as number one in TOP500 in 1996), FIRST (hybrid cluster with gravity accelerator), PACS-CS (bandwidth-aware cluster) and HA-PACS (high-density GPU cluster) as the representative supercomputers in Japan. He also contributed to the system design of K Computer as a member of architecture design working group in RIKEN and currently a member of operation advisory board of AICS, RIKEN. He received ACM Gordon Bell Prize in 2011. His recent research interests include accelerated HPC systems and direct communication hardware/software for accelerators in HPC systems based on FPGA technology.

November 13, 2014 - Eric Lingerfelt: Accelerating Scientific Discovery with the Bellerophon Software System

We present an overview of a software system, Bellerophon, built to support a production-level HPC application called CHIMERA, which simulates the temporal evolution of core-collapse supernovae. Developed over the last 5 years at ORNL, Bellerophon enables CHIMERA's geographically dispersed team of collaborators to perform job monitoring and real-time data analysis from multiple supercomputing resources, including platforms at OLCF, NERSC, and NICS. Its n-tier architecture provides an encapsulated, end-to-end software solution that enables the CHIMERA team to quickly and easily access highly customizable animated and static views of results from anywhere in the world via a web-deliverable, cross-platform desktop application. Bellerophon has quickly evolved into the CHIMERA team's de facto work environment for analysis, artifact management, regression testing, and other workflow tasks. We will also present plans to expand utilization and encourage adoption by generalizing the system for new HPC applications and domains.

Eric Lingerfelt is a technical staff member and software engineer in the ORNL Computer Science and Mathematics Division's Computer Science Research Group. Mr. Lingerfelt specializes in developing n-tier software systems with web-deliverable, highly-interactive client-side applications that allow users to generate, access, visualize, manipulate, and share complex sets of data from anywhere in the world. For over a decade, he has designed, developed, and successfully delivered multiple software systems to the US Department of Energy and other customers in the fields of nuclear astrophysics, Big Bang cosmology, core-collapse supernovae, isotope sales and distribution, environmental science, nuclear energy, theoretical nuclear science, and the oil and gas industry. He is a 2011 ORNL Computing and Computational Sciences Directorate Distinguished Contributor and the recipient of the 2013 CSMD Most Significant Technical Contribution Award. Mr. Lingerfelt received his B.S. in Mathematics and Physics from East Tennessee State University in 1998 and his M.S. in Physics from the University of Tennessee in 2002.

November 12, 2014 - John Springer: Discovery Advancements Through Data Analytics

The Purdue Discovery Advancements Through Analytics (D.A.T.A.) Laboratory seeks to address the computational challenges surrounding data analytics in the life, physical, and social sciences by focusing on the development and optimization of parallel codes that perform analytics. They complement these efforts by also examining the aspects of data analytics related to user adoption as well as the best practices pertaining to the management of associated metadata. In this seminar, the lead investigator in the D.A.T.A. Lab, Dr. John Springer, will discuss the lab¹s past and current efforts and will introduce the lab's planned activities.

John Springer is an Associate Professor in Computer and Information Technology at Purdue University and the Lead Scientist for High Performance Data Management Systems at the Bindley Bioscience Center at Discovery Park. Dr. Springer's discovery efforts focus on distributed and parallel computational approaches to data integration and analytics, and he serves as the leader of the Purdue Discovery Advancements Through
Analytics (D.A.T.A.) Laboratory.

November 10, 2014 - Christopher Rodrigues: High-Level Accelerator-Style Programming of Clusters with Triolet

Container libraries are popular for parallel programming due to their simplicity. Programs invoke library operations on entire containers, relying on the library implementation to turn groups of operations into efficient parallel loops and communication. However, their suitability for parallel programming on clusters has been limited, due to having a limited repertoire of parallel algorithm implementations under the hood.

In this talk, I will present Triolet, a high-level functional language for using a cluster as a computational accelerator. Triolet improves upon the generality of prior distributed container library interfaces by separating concerns of parallelism, loop nesting, and data partitioning. I will discuss how this separation is used to efficiently decompose and communicate multidimensional array blocks, as well as to generate irregular loop nests from computations with variable-size temporary data. These loop-building algorithms are implemented as library code. Triolet's compiler inlines and specializes library calls to produce efficient parallel loops. The resulting code often performs comparably to handwritten C.

For several compute-intensive loops running on a 128-core cluster (with 8 nodes and 16 cores per node), Triolet performs significantly faster than sequential C code, with performance ranging from slightly faster to 4.3× slower than manually parallelized C code. Thus, Triolet demonstrates that a library of container traversal functions can deliver cluster-parallel performance comparable to manually parallelized C code without requiring programmers to manage parallelism. Triolet carries lessons for the design of runtimes, compilers, and libraries for parallel programming using container APIs.

Christopher Rodrigues got his Ph.D. in Electrical Engineering at the University of Illinois. He is one of the developers of the Parboil GPU benchmark suite. A computer architect by training, he has chased parallelism up the software stack, having worked on alias and dependence analysis, parallel programming for GPUs, statically typed functional language compilation, and the design of parallel libraries. He is interested in reducing the pain of writing and maintaining high-performance parallel code.

November 3, 2014 - Benjamin Lee: Statistical Methods for Hardware-Software Co-Design

Abstract: To pursue energy-efficiency, computer architects specialize and coordinate design across the hardware/software interface. However, coordination is expensive, with high non-recurring engineering costs that arise from an intractable number of degrees of freedom. I present the case for statistical methods to infer regression models, which provide tractability for complex design questions. These models estimate performance and power as a function of hardware parameters and software characteristics to permit coordinated design. For example, I show how to coordinate the tuning of sparse linear algebra with the design of the cache and memory hierarchy. Finally, I describe on-going work in using logistic regression to understand the root causes of performance tails and outliers in warehouse-scale datacenters.

BIO: Benjamin Lee is an assistant professor of Electrical and Computer Engineering at Duke University. His research focuses on scalable technologies, power-efficient architectures, and high-performance applications. He is also interested in the economics and public policy of computation. He has held visiting research positions at Microsoft Research, Intel Labs, and Lawrence Livermore National Lab. Dr. Lee received his B.S. in electrical engineering and computer science at the University of California, Berkeley and his Ph.D. in computer science at Harvard University. He did postdoctoral work in electrical engineering at Stanford University. He received an NSF Computing Innovation Fellowship and an NSF CAREER Award. His research has been honored as a Top Pick by IEEE Micro Magazine and has been honored twice as Research Highlights by Communications of the ACM.

October 28, 2014 - Robinson Pino: New Program Directions for Advanced Scientific Computing Research (ASCR)

October 24, 2014 - Qingang Xiong: Computational Fluid Dynamics Simulation of Biomass Fast Pyrolysis - From Particle Scale to Reactor Scale

Abstract: Fast pyrolysis, a prominent thermochemical conversion approach to produce bio-oil from biomass, has attracted increased interest. However, the fundamental mechanisms of biomass fast pyrolysis are still poorly understood and the design, operation and optimization of pyrolyzers are far from satisfactory because of the characteristics that complicated multiphase flows are coupled with complex devolatilization processes. Computational fluid dynamics (CFD) is a powerful tool to investigate the underlined mechanisms of biomass fast pyrolysis and help optimize efficient pyrolyzers. In this presentation, I will describe my postdoctoral work on CFD of biomass fast pyrolysis at both particle scale and reactor scale. For the particle-scale CFD, the lattice Boltzmann method is used to describe the flow and heat transfer processes. The intra-particle gas flow is modeled by the Darcy law. A lumped multi-step reaction kinetics is employed to model the biomass decomposition. Through the particle-scale CFD, detailed information on the evolution of a biomass particle is obtained. The velocity, temperature, and species mass fraction inside and surrounding the particles are presented. The evolutions of particle shape and density are monitored. For the reactor-scale CFD, we use the so-called multi-fluid model to simulate the multiphase hydrodynamics, in which all phases are treated as interpenetrating continua. Volume-fraction based mass, momentum, energy, and species conservation equations are employed to describe the density, velocity, temperature and mass fraction fields. Various submodels are used to close the conservation equations. Using this model, fluidized-bed and auger reactors are modeled. Parametric and sensitivity studies on the effects of operating conditions, devolatilization schemes, and submodel selections are investigated. It is expected that these multi-scale CFD simulation will contribute significantly to the accuracy improvement of industrial reactor modeling for biomass fast pyrolysis. Finally, I will discuss on some of my ideas about the future directions in the multi-scale CFD simulation of biomass thermochemical conversion.

Biography: Dr. Qingang Xiong is a postdoctoral research associate in the Department of Mechanical Engineering, Iowa State University. Dr. Xiong obtained his Ph.D. in Chemical Engineering from Institute of Process Engineering, Chinese Academy of Sciences in 2011. After his graduation, Dr. Xiong went to the University of Heidelberg, Germany, as a software engineer for half year to conduct GPU-based high performance computing of astrophysics. Dr. Xiong's research areas are computational fluid dynamics, CPU- and GPU-based parallel computing, heat and mass transfer, and biomass thermochemical conversion. Dr. Xiong has published more 20 scientific papers and given more than 15 conference presentations. Dr. Xiong serves as editorial board member for several journals and chair in international conferences.

October 23, 2014 - Liang Zhou: Multivariate Transfer Function Design

Visualization and exploration of volumetric datasets has been an active area of research for over two decades. During this period, volumetric datasets used by domain users have evolved from univariate to multivariate. The volume datasets are typically explored and classified via transfer function design and visualized using direct volume rendering. To improve classification results and to enable the exploration of multivariate volume datasets, multivariate transfer functions emerge. In this talk, we describe our research on multivariate transfer function design. To improve the classification of univariate volumes, various one-dimensional (1D) or two-dimensional (2D) transfer function spaces have been proposed; however, these methods work on only some datasets. We propose a novel transfer function method that provides better classifications by combining different transfer function spaces. Methods have been proposed for exploring multivariate simulations; however, these approaches are not suitable for complex real-world datasets and may be unintuitive for domain users. To this end, we propose a method based on user-selected samples in the spatial domain to make complex multivariate volume data visualization more accessible for domain users. However, this method still requires users to fine-tune transfer functions in parameter space transfer function widgets, which may not be familiar to them. We therefore propose GuideME, a novel slice-guided semiautomatic multivariate volume exploration approach. GuideME provides the user, an easy-to-use, slice-based user interface that suggests the feature boundaries and allows the user to select features via click and drag, and then an optimal transfer function is automatically generated by optimizing a response function. Throughout the exploration process, the user does not need to interact with the parameter views at all. Finally, real-world multivariate volume datasets are also usually of large size, which is larger than the GPU memory and even the main memory of standard work stations. We propose a ray-guided out-of-core, interactive volume rendering and efficient query method to support large and complex multivariate volumes on standard work stations.

October 20, 2014 - John Schmisseur: New HORIzONS: A Vision for Future Aerospace Capabilities within the University of Tennessee

Recently, issues of national interest including the planned DoD Pivot to the Pacific and assured large payload access to space have renewed commitment to the development of high-speed aerospace systems. As a result, many agencies, including the Air Force, are exploring new technology systems to facilitate operation in the hypersonic flight regime. One facet of the Air Force strategy in this area has been a reemphasis of hypersonic testing capabilities at the Arnold Engineering Development Complex (AEDC) and the establishment of an Air Force Research Laboratory scientific research group co-located at the complex. These recent events provide an opportunity for the University of Tennessee to support the Air Force and other agencies in the realization of planned high-speed capabilities while simultaneously establishing a precedent for the integration of contributions across the UT system.

The HORIzON center (High-speed Original Research & InnovatiON) at the University of Tennessee Space Institute (UTSI) has been established to address the current, intermediate and strategic challenges faced by national agencies in the development of high-speed/hypersonic capabilities. Specifically, the center will foster the development of world-class basic research capabilities in the region surrounding AEDC, create a culture of discovery and innovation integrating elements from academia, government and small business, and take the lead in the development of a rational methodology for the integration of large scale empirical and numerical data sets within a digital environment.

Dr. Schmisseur's presentation will provide the background and motivation that has driven the establishment of the HORIzON center and highlight a few of the center's major research vectors. He will be visiting ORNL to explore how contributions from the DoE can be integrated within the HORIzON enterprise to support the achievement of our national goals in high-speed technology development.

October 14, 2014 - Krishna Chaitanya Gurijala: Shaped-Based Analysis

Shape analysis plays a critical role in many fields, especially in medical analysis. There has been substantial research performed for shape analysis in manifolds. On the contrary, shape-based analysis has not received much attention for volumetric data. It is not feasible to directly extend the successful manifold shape analysis methods, such as the heat diffusion, to volumes due to the huge computational cost. The work presented herein seeks to address this problem by presenting two approaches for shape analysis in volumes that not only capture the shape information efficiently but also reduce the computational time drastically.

The first approach is a cumulative approach and is called the Cumulative Heat Diffusion, where the heat diffusion is carried out by simultaneously considering all the voxels as sources. The cumulative heat diffusion is monitored by a novel operator called the Volume Gradient Operator, which is a combination of the well-known Laplace-Beltrami operator and a data-driven operator. The cumulative heat diffusion is computed by considering all the voxels and hence is inherently dependent on the resolution of the data. Therefore, we propose a second approach which is a stochastic approach for shape analysis. In this approach the diffusion process is carried out by using tiny massless particles termed shapetons. The shapetons are diffused in a Monte Carlo fashion across the voxels for pre-defined distance (serves as single time step) to obtain the shape information. The direction of propagation for the shapetons is monitored by the volume gradient operator. The shapeton diffusion is
a novel diffusion approach and is independent of the resolution of the data. These approaches robustly extract features, objects based on shape.

Both shape analysis approaches are used in several medical applications such as segmentation, feature extraction, registration, transfer function design and tumor detection. This work majorly focuses on the diagnosis of colon cancer. Colorectal cancer is the second leading cause of cancer related mortality in the United States. Virtual colonoscopy is a viable non-invasive screening method, whereby a radiologist can explore a colon surface to locate and remove the precancerous polyps (protrusions/bumps on the colon wall). To facilitate an efficient colon exploration, a robust and shape-preserving colon flattening algorithm is presented using the heat diffusion metric which is insensitive to topological noise. The flattened colon surface provides effective colon exploration, navigation, polyp visualization, detection, and verification. In addition, the flattened colon surface is used to consistently register the supine and prone colon surfaces. Anatomical landmarks such as the
taeniae coli, flexures and the surface feature points are used in the colon registration pipeline and this work presents techniques using heat diffusion to automatically identify them


September 30, 2014 - Stanley Osher: What Sparsity and l1 Optimization Can Do For You

Sparsity and compressive sensing have had a tremendous impact in science, technology, medicine, imaging, machine learning and now, in solving multiscale problems in applied partial differential equations, developing sparse bases for Elliptic eigenspaces. l1 and related optimization solvers are a key tool in this area. The special nature of this functional allows for very fast solvers: l1 actually forgives and forgets errors in Bregman iterative methods. I will describe simple, fast algorithms and new applications ranging from sparse dynamics for PDE, new regularization paths for logistic regression and support vector machine to optimal data collection and hyperspectral image processing. Credits: Stanley Osher, jointly with many others)


Dr. Osher's awards and accomplishments are voluminous and exceptionally remarkable, just a few highlights of which include:

The Gauss prize citation summarized Dr. Osher's many achievements by stating that, "Stanley Osher has made influential contributions in a broad variety of fields in applied mathematics. These include high resolution shock capturing methods for hyperbolic equations, level set methods, PDE based methods in computer vision and image processing, and optimization. His numerical analysis contributions, including the Engquist-Osher scheme, TVD schemes, entropy conditions, ENO and WENO schemes and numerical schemes for Hamilton-Jacobi type equations have revolutionized the field. His level set contributions include new level set calculus, novel numerical techniques, fluids and materials modeling, variational approaches, high co-dimension motion analysis, geometric optics, and the computation of discontinuous solutions to Hamilton-Jacobi equations; level set methods have been extremely influential in computer vision, image processing, and computer graphics. In addition, such new methods have motivated some of the most fundamental studies in the theory of PDEs in recent years, completing the picture of applied mathematics inspiring pure mathematics."

September 11, 2014 - Jeffrey Willert: Increased Efficiency and Functionality inside the Moment-Based Accelerated Thermal Radiation Transport Algorithm

Recent algorithm design efforts for thermal radiation transport (TRT) have included the application of "Moment-Based Acceleration" (MBA). These MBA algorithms achieve accurate solutions in a highly efficient manner by moving a large portion of the computational effort to a nonlinearly consistent low-order (reduced phase space) domain.

In this talk I will discuss recent improvements/advancements of the MBA-TRT algorithm. We explore the use of Anderson Acceleration to solve the nonlinear low-order system as a replacement to a more traditional Jacobian-Free Newton-Krylov solver. Additionally, the MBA-TRT algorithm has struggled when error from Monte Carlo calculations builds up over several time steps. This error often corrupts the low-order system and may prevent convergence of the nonlinear solver. We attempt to remedy this by implementing a "Residual" Monte Carlo algorithm in which the stochastic error is greatly reduced for the same or less computational cost. We conclude with a discussion of areas of future work.

September 9, 2014 - Swen Boehm: STCI - A scalable approach for tools and runtimes

The system community to is required to provide scalable and resilient communication substrates and run-time infrastructures by the ever-increasing complexity and scale of high-performance computer (HPC) systems and parallel scientific applications. Two system research efforts will be presented, focusing on adaptation and customization of HPC runtimes as well as the usability of such systems. The Scalable runTime Component Infrastructure (STCI) will be introduced, a modular library that enables the implementation of new scalable and resilient HPC run-time systems. Its unique modular architecture eases the adaption to a particular HPC system. Additionally, STCI is based on the concept of "agents", which allows to further customize run-time services. For instance, STCI's customizability was recently utilized to implement an MPMD style execution model on top of STCI. Finally, "librte," will be presented: a unified runtime abstraction API that aims at improving the usability of HPC systems by providing an abstraction to various run-time systems such as Cray ALPS, PMI, ORTE and STC. "librte" is used by the Universal Common Communication Substrate (UCCS) and provides an simple and well-defined interface to tool developers.

September 9, 2014 - Ewa Deelman: Science Automation with the Pegasus Workflow Management System

Abstract sent on behalf of the speaker:
Scientific workflows allow scientists to declaratively describe potentially complex applications that are composed of individual computational components. Workflows also include a description of the data and control dependencies between the components. This talk will describe example workflows in various science domains including astronomy, bioinformatics, earthquake science, gravitational-wave physics, and others. It will examine the challenges faced by workflow management systems when executing workflows in distributed and high-performance computing environments. In particular, the talk will describe the Pegasus Workflow Management System developed at USC/ISI. Pegasus bridges the scientific domain and the execution environment by automatically mapping high-level workflow descriptions onto distributed resources. It locates the input data and computational resources necessary for workflow execution. It also restructures the workflow for performance and reliability reasons. Pegasus can execute workflows on a laptop, a campus cluster, grids, and clouds. It can handle workflows with a single task or millions of tasks and has been used to manage workflows accessing and generating TeraBytes of data. The talk will describe the capabilities of Pegasus and how it manages heterogeneous computing environments.

Ewa Deelman is a Research Associate Professor at the USC Computer Science Department and the Assistant Director of Science Automation Technologies at the USC Information Sciences Institute. Dr. Deelman's research interests include the design and exploration of collaborative, distributed scientific environments, with particular emphasis on workflow management as well as the management of large amounts of data and metadata. In 2007, Dr. Deelman edited a book: "Workflows in e-Science: Scientific Workflows for Grids", published by Springer. She is also the founder of the annual Workshop on Workflows in Support of Large-Scale Science, which is held in conjunction with the Super Computing conference. In 1997 Dr. Deelman received her PhD in Computer Science from the Rensselaer Polytechnic Institute.

August 29, 2014 - C. David Levermore: Coarsening of Particle Systems

Each particle in a simulation of a system of particles usually represents a huge number of real particles. We present a framework for constructing the dynamics for a so-called coarsened system of simulated particles. We build an approximate solution to the Liouville equation for the original system from the solution of an equation for the phase-space density of a smaller system. We do this with a Markov approximation within a Mori-Zwanzig formalism based upon a reference density. We then identify the evolution equation for the reduced phase-space density as the forward Kolmogorov equation of a Markov process. The original system governed by deterministic dynamics is then simulated with the coarsened system governed by this Markov process. Both Monte Carlo (MC) and molecular dynamics (MD) simulations can be view from this framework. More generally, the reduced dynamics can have elements of both MC and MD.

August 21, 2014 - Quan Long: Laplace method for optimal Bayesian experimental design with applications in impedance tomography and seismic source inversion

Abstract sent on behalf of the speaker:
Laplace method is a widely used method to approximate an integration in statistics. We analyze this method in the context of optimal Bayesian experimental design and extend this method from the classical scenario, where parameters can be completely determined by the experiment, to the scenarios where an unidentifiable parametric manifold exists. We show that by carrying out this approximation the estimation of the expected Kullback-Leibler divergence can be significantly accelerated. The developed methodology has been applied to the optimal experimental design of impedance tomography and seismic source inversion.

August 18, 2014 - Pierre Gremaud: Impedance boundary conditions for flows on networks

Abstract sent on behalf of the speaker:

From hemodynamics to engineering applications, many flow problems are solved on networks. For feasibility reasons, computational domains are often truncated and outflow conditions have to be prescribed at the end of the domain under consideration.

We will show how to efficiently compute the impedance of specific networks and how to use this information as outflow boundary condition. The method is based on linearization arguments and Laplace transforms. The talk will focus on hemodynamics applications but we will indicate how to generalize the approach.

July 23, 2014 - Frédérique Laurent-Negre: High order moment methods for the description of spray: mathematical modeling and adapted numerical methods

Abstract sent on behalf of the speaker:
We consider a two-phase flow constituted of a dispersed phase of liquid droplets (a spray) in a gas flow. This type of flow occurs in many applications, such as two-phase combustion or solid propulsion. The spray is then characterized by its distribution in size and velocity, which satisfies a Boltzmann-type equation. As an alternative to Lagrangian methods that are commonly used for the numerical simulations, we have developed Eulerian models that can account for the polydisperse character of the sprays. They use moments in size and velocity of the distribution on fixed intervals of droplet size. These moments represent the number and the mass or the amount of surface area, the momentum ... of all droplets of a given size range. However, the space in which the moment vectors live becomes complex when high order moments are considered. A key point of numerical methods is then to ensure that the moment vector will stay in this space. We study here some mathematical models derived from the kinetic model as well as high-order numerical methods specifically developed to preserve the moment space.

July 22, 2014 - Christos Kavouklis: Numerical Solution of the 3D Poisson Equation with the Method of Local Corrections

Abstract sent on behalf of the speaker:
We present a new version of the Method of Local Corrections; a low communications algorithm for the numerical solution of the free space Poisson's equation on 3D structured grids. We are assuming a decomposition of the fine computational domain (which contains the global right hand side - charge) into a set of small disjoint cubic patches (e.g. of size 33^3). The Method of Local Corrections comprises three steps where Mehrstellen discretizations of the Laplace operator are employed; (i) A loop over the fine disjoint patches and the computation of local potentials on sufficiently large extensions of theirs (downward pass) (ii) An inexpensive global Poisson solve on the associated coarse domain with right hand side computed by applying the coarse mesh Laplacian to the local potentials of step (i) and (iii) A correction of the local solutions computed in step (i) on the boundaries of the fine disjoint patches based on interpolating the global coarse solution and a propagation of the corrections in the patch interiors via local Dirichlet solves (upward pass). Local solves in the downward pass and the global coarse solve are performed utilizing the domain doubling algorithm of Hockney. For the local solves in the upward pass we are employing a standard DFT Dirichlet Poisson solver. In this new version of the Method of Local Corrections we take into consideration the local potentials induced by truncated Legendre expansions of degree P of the local charges (the original version corresponded to P=0). The result is an h-p scheme that is P+1-order accurate and involves only local communication. Specifically, we only have to compute and communicate the coefficients of local Legendre expansions (that is, for instance, 20 scalars per patch for expansions of degree P=3). Several numerical simulations are presented to illustrate the new method and demonstrate its convergence properties.

July 17, 2014 - Kody John Hoffman Law: Dimension-independent, likelihood-informed (DILI) MCMC (Markov chain Monte Carlo) sampling algorithms for Bayesian inverse problems

July 1, 2014 - Xubin (Ben) He: High Performance and Reliable Storage Support for Big Data

Abstract sent on behalf of the speaker:
Big data applications have imposed unprecedented challenges in data analysis, storage, organization and understanding due to their heterogeneity, volume, complexity, and high velocity. These challenges are for both computer systems researchers who investigate new storage and computational solutions to support fast and reliable access to large datasets and application scientists in various disciplines who exploit these datasets of vital scientific interest for knowledge discovery. In this talk, I will talk about my research in data storage and I/O systems, particularly in solid-state devices (SSDs) and erasure codes to provide cost effective solutions for big data management for high performance and reliability.

Dr. Xubin He is a Professor and the Graduate Program Director of Electrical and Computer Engineering at Virginia Commonwealth University. He is also the Director of the Storage Technology and Architecture Research (STAR) lab. Dr. He received his PhD in Electrical and Computer Engineering from University of Rhode Island, USA in 2002 and both his MS and BS degrees in Computer Science from Huazhong University of Science and Technology, China, in 1997 and 1995, respectively. His research interests include computer architecture, reliable and high availability storage systems and distributed computing. He has published more than 80 refereed articles in prestigious journals such as IEEE Transactions on Parallel and Distributed Systems (TPDS), Journal of Parallel and Distributed Computing (JPDC), ACM Transactions on Storage, and IEEE Transactions on Dependable and Secure Computing (TDSC), and at various international conferences, including USENIX FAST, USENIX ATC, Eurosys, IEEE/IFIP DSN, IEEE IPDPS, MSST, ICPP, MASCOTS, LCN, etc. He is the general co-chair for IEEE NAS'2009, program co-chair for MSST'2010, IEEE NAS'2008 and SNAPI'2007. Dr. He has served as a proposal review panelist for NSF, various chair roles and committee members for many professional conferences in the field. Dr. He was a recipient of the ORAU Ralph E. Powe Junior Faculty Enhancement Award in 2004, the TTU Chapter Sigma Xi Research Award in 2010 and 2005, and TTU ECE Most Outstanding Teaching Faculty Award in 2010. He holds one U.S. patent. He is a senior member of the IEEE, a member of the IEEE Computer Society and USENIX.

June 18, 2014 - Hari Krishnan: Enabling Collaborative Domain-Centric Visualization and Analysis in High Performance Computing Environments

Abstract and Bio sent on behalf of the speaker:
Multi-institutional interdisciplinary domain science teams are increasingly commonplace in modern high performance computing (HPC) environments. Visualization tools, such as VisIt and ParaView, have traditionally focused more on improving scalability, performance, and efficiency of algorithms over enabling ease of use and collaborative functionality that compliments the power of the HPC resources. In addition, visualization tools provide an algorithm-based infrastructure focusing on a diverse set of readers, plots, and operations rather than higher level domain-specific set of capabilities when providing solutions to the scientific community. This strategy yields a higher return on investment, but increases complexity for the user community.

As larger, more diverse teams of scientists become more common place they require applications tuned at providing the most use out of a heavily utilized and resource constrained distributed HPC environment. Standard methods of visualization and data sharing pose significant challenges detracting from users focus on scientific inquiry.

In this presentation I will highlight three new capabilities under development within VisIt to address these needs which enable domain scientists to refocus their efforts on more productive endeavors. These features include tailored visualization using a new PySide/PyQt infrastructure, a new parallel analysis framework supporting Python & R scripting, and a collaboration suite that allows sharing and communicating among a variety of display mediums from mobile devices to visualization clusters. The goal is to enhance the experience of domain scientists by streamlining their work environment, providing easy access to a complex set of resources, and enabling collaborations, sharing, and communication among a diverse team.

Hari Krishnan graduated with his Ph.D. in computer science and works for the visualization and graphics group as a computer systems engineer at Lawrence Berkeley National Laboratory. His research focuses on scientific visualization on HPC platforms and Many-core architectures. He leads the development effort on several HPC related projects which include research on new visualization methods, optimizing scaling and performance on Cray machines, working on data model optimized I/O libraries and enabling a remote workflow services. He is also an active developer on several major open source projects which include VisIt, NiCE, H5hut, and has developed plugins for Fiji/ImageJ.


May 20, 2014 - Weiran Sun: A Spectral Method for Linear Half-Space Kinetic Equations

Abstract sent on behalf of the speaker:
Half-space equations naturally arise in boundary layer analysis of kinetic equations. In this talk we will present a unified proof for the well-posedness of a class of linear half - space equations with general incoming data. We will also show a spectral method to numerically resolve these type of equations in a systematic way. Our main strategy in both analysis and numerics includes three steps: adding damping terms to the original half-space equation, using an inf - sup argument and even-odd decomposition to establish the well-posedness of the damped equation, and then recovering solutions to the original half - space equation. The accuracy of the damped equation is shown to be quasi-optimal and the numerical error of approximations to the original equation is controlled by that of the damped equation. Numerical examples are shown for the isotropic neutron transport equation and the linearized BGK equation. This is joint work with Qin Li and Jianfeng Lu.

May 14, 2014 - Michael Bauer: Programming Distributed Heterogeneous Architectures with Logical Regions

Abstract and Bio sent on behalf of the speaker:
Modern supercomputers now encompass both heterogeneous processors and deep, complex memory hierarchies. Programming these machines currently requires expertise in an eclectic collection of tools (MPI, OpenMP, CUDA, etc.) that primarily focus on describing parallelism while placing the burden of data movement on the programmer. Legion is an alternative approach that provides extensive support for describing the structure of program data through logical regions. Logical regions can be dynamically partitioned into sub-regions giving applications an explicit mechanism for directly conveying information about locality and independence to the Legion runtime. Using this information, Legion automatically extracts task parallelism and orchestrates data movement through the memory hierarchy. Time permitting, we will discuss results from several applications including a port of S3D, a production combustion simulation running on Titan, the Department of Energy's current flagship supercomputer.

Michael Bauer is a sixth year PhD student in computer science at Stanford University. His interests include the design and implementation of programming systems for supercomputers and distributed systems.

May 8, 2014 - Jerry McMahan: Bayesian Inverse Problems for Uncertainty Quantification: Prediction with Model Discrepancy and a Verification Framework

Abstract sent on behalf of the speaker:
Recent work in uncertainty quantification (UQ) has made it feasible to compute the statistical uncertainties for mathematical models in physics, biology, and engineering applications, offering added insight into how the model relates to the measurement data it represents. This talk focuses on two issues related to the reliability of UQ methods for model calibration in practice. The first issue concerns calibration of models having discrepancies with respect to the phenomena they model when these discrepancies violate commonly employed statistical assumptions used for simplifying computation. Using data from a vibrating beam as a case study, I will illustrate how these discrepancies can limit the accuracy of predictive simulation and discuss some approaches for reducing the impact of these limitations. The second issue concerns verifying the accurate implementation of computational algorithms for solving inverse problems in UQ. In this context, verification is particularly important as the nature of the computational results makes detection of subtle implementation errors unlikely. I will present a collaboratively developed computational framework for verification of statistical inverse problem solvers and present examples of its use to verify the Markov Chain Monte Carlo (MCMC) based routines in the QUESO C++ library.

May 5, 2014 - Abhishek Kumar: Multiscale modeling of polycrystalline material for optimized property

May 1, 2014 - Eric Chung: Staggered Discontinuous Galerkin Methods

ABSTRACT FROM SPEAKER: In this talk, we will present the staggered discontinuous Galerkin methods. These methods are based on piecewise polynomial approximation on staggered grids. The basis functions have to be carefully designed, so that some compatibility conditions are satisfied. Moreover, the use of staggered grids bring some advantages, such as optimal convergence and conservation. We will discuss the basic methodologies and applications to wave propagation and fluid flows.

April 7, 2014 - Tom Scogland: Runtime Adaptation for Autonomic Heterogeneous Computing

Heterogeneity is increasing at all levels of computing, certainly with the rise in general purpose computing with GPUs in everything from phones to supercomputers. More quietly it is increasing with the rise of NUMA systems, hierarchical caching, OS noise, and a myriad of other factors. As heterogeneity becomes a fact of life at every level of computing, efficiently managing heterogeneous compute resources is becoming a critical task. In order to make the problem tractable we must develop methods and systems to allow software to adapt to the hardware it finds within a given node at runtime. The goal is to make the complex functions of heterogeneous computing autonomic, handling load balancing, memory coherence and other performance critical factors in the runtime. This talk will discuss my research into this area, including the design of a work-sharing construct for CPU and GPU resources in OpenMP and automated memory reshaping/re-mapping for locality.

Dr. Scogland is a candidate for a postdoctoral position with the Computer Science Research Group

April 4, 2014 - Alex McCaskey: Effects of Electron-Phonon Coupling in Single-Molecule Magnet Transport Junctions Using a Hybrid Density Functional Theory and Model Hamiltonian Approach

Recent experiments have shown that junctions consisting of individual single-molecule magnets (SMMs) bridged between two electrodes can be fabricated in three-terminal devices, and that the characteristic magnetic anisotropy of the SMMs can be affected by electrons tunneling through the molecule. Vibrational modes of the SMM can couple to electronic charge and spin degrees of freedom, and this coupling also influences the magnetic and transport properties of the SMM. The effect of electron-phonon coupling on transport has been extensively studied in small molecules, but not yet for junctions of SMMs. The goals of this talk will be two-fold: to present a novel approach for studying the effects of this electron-phonon coupling in transport through SMMs that utilizes both density functional theory calculations and model Hamiltonian construction and analysis, and to present a software framework based on this hybrid approach for the simulation of transport across user-defined SMMs . The results of these simulations will indicate a characteristic suppression of the current at low energies that is strongly dependent on the overall electron-phonon coupling strength and number of molecular vibrational modes considered.

Mr. McCaskey is a candidate for a graduate position in the Computer Science Research Group

March 26, 2014 - Steven Wise: Convergence of a Mixed FEM for a Cahn-Hilliard-Stokes System

Abstract and Bio sent on behalf of the speaker:
Co-Authors: Amanda Diegel and Xiaobing Feng
Abstract: In this talk I will describe a mixed finite element method for a modified Cahn-Hilliard equation coupled with a non-steady Darcy-Stokes flow that models phase separation and coupled fluid flow in immiscible binary fluids and di-block copolymer melts. I will focus both on numerical implementation issues for the scheme as well as the convergence analysis. The time discretization is based on a convex splitting of the energy of the equation. I will show that our scheme is unconditionally energy stable with respect to a spatially discrete analogue of the continuous free energy of the system and unconditionally uniquely solvable. We can show, in addition, that the phase variable is bounded in L^\infty(0,T,L^\infty) and the chemical potential is bounded in L\infty(0,T,L^2), unconditionally in both two and three dimensions, for any finite final time T. In fact the bounds in such estimates grow only (at most) linearly in T. I will prove that these variables converge with optimal rates in the appropriate energy norms in both two and three dimensions. Finally, I will discuss some extensions of the scheme to approximate solutions for diffuse interface flow models with large differences in density.

Steven Wise is an associate professor of mathematics at the University of Tennessee. He specializes in fast adaptive nonlinear algebraic solvers for numerical PDE, numerical analysis, and scientific computing more broadly. Before coming to the University of Tennessee, he was a postdoc and visiting assistant professor of mathematics and biomedical engineering at the University of California, Irvine. He earned a PhD in engineering physics from the University of Virginia in 2003.

March 18, 2014 - Zhiwen Zhang: A Dynamically Bi-Orthogonal Method for Time-Dependent Stochastic Partial Differential Equation

We propose a dynamically bi-orthogonal method (DyBO) to study time dependent stochastic partial differential equations (SPDEs). The objective of our method is to exploit some intrinsic sparse structure in the stochastic solution by constructing the sparsest representation of the stochastic solution via a bi-orthogonal basis. It is well-known that the Karhunen-Loeve expansion minimizes the total mean squared error and gives the sparsest representation of stochastic solutions. However, the computation of the KL expansion could be quite expensive since we need to form a covariance matrix and solve a large-scale eigenvalue problem. In this talk, we derive an equivalent system that governs the evolution of the spatial and stochastic basis in the KL expansion. Unlike other reduced model methods, our method constructs the reduced basis on-the-fly without the need to form the covariance matrix or to compute its eigen-decomposition. We further present an adaptive strategy to dynamically remove or add modes, perform a detailed complexity analysis, and discuss various generalizations of this approach. Several numerical experiments will be provided to demonstrate the effectiveness of the DyBO method.

Zhiwen Zhang is a postdoctoral scholar in the Department of Computing and Mathematical Sciences, California Institute of Technology. He graduated from the Department of Mathematical Sciences, Tsinghua University in 2011, where he was awarded the degree of Ph.D. in Applied Mathematics. From 2008 to 2009, he was studied in the University of Wisconsin at Madison as a visiting student. His research interests lie in the applied analysis and numerical computation of problems arising from quantum chemistry, wave propagation, porous media, cell evolution, Bayesian updating, stochastic fluid dynamics and random heterogeneous media.

March 4, 2014 - David Seal: Beyond the Method of Lines Formulation: Building Spatial Derivatives into the Temporal Integrator

Abstract: High-order solvers for hyperbolic conservation laws often fall under two disparate categories. On one hand, the method of lines formulation starts by discretizing the spatial variables, and then a system of ODEs is solved using an appropriate time-integrator. On the other hand, Lax-Wendroff discretizations immediately convert Taylor series in time to discrete spatial derivatives. In this talk, we present generalizations of these methods including high-order discontinuous Galerkin (DG) methods based on multiderivative time-integrators, as well as high-order finite difference weighted essentially non-oscillatory (WENO) methods based on the Picard Integral Formulation (PIF) of the conservation law. Multiderivative time integrators are extensions of Runge-Kutta and Taylor methods. They reduce the overall storage required for a Runge-Kutta method, and they introduce flexibility to the Taylor series in time methods by allowing for new coefficients to be used at various stages. In the multiderivative DG method, "modified fluxes'' are used to define high-order Riemann problems, which are similar to those defined in the generalized Riemann problem solvers incorporated in the Arbitrary DERivative (ADER) methods. The finite difference WENO method is based on a Picard Integral Formulation of the PDE, where we first integrate in time, and then work on discretizing the temporal integral. The present formulation is automatically mass conservative, and therefore it introduces the possibility of modifying finite difference fluxes for the purpose of accomplishing tasks such as positivity preservation, or reducing the number of expensive non-linear WENO reconstructions. For now, we present results for a single-step version of the PIF-WENO method which lends itself to incorporating adaptive mesh refinement technology. Results for one- and two-dimensional conservation laws are presented, and they indicate that the new methods compete well with current state of the art technology.

February 21, 2014 - Zhou Li: Harnessing high-resolution mass spectrometry and high-performance supercomputing for quantitative characterization of a broad range of protein post-translational modifications in a natural microbial community

Microbial communities populate and shape diverse ecological niches within natural environments. The physiology of organisms in natural consortia has been studied with community proteomics. However, little is known about how free-living microorganisms regulate protein activities through post-translational modifications (PTMs). Here, we harnessed high-performance mass spectrometry and supercomputing for identification and quantification of a broad range of PTMs (including hydroxylation, methylation, citrullination, acetylation, phosphorylation, methylthiolation, S-nitrosylation, and nitration) in microorganisms. Using an E. coli proteome as a benchmark, we identified more than 5,000 PTM events of diverse types and a large number of modified proteins that carried multiple types of PTMs. We applied this demonstrated approach to profiling PTMs in two growth stages of a natural microbial community growing in the acid mine drainage environment. We found that the multi-type, multi-site protein modifications are highly prevalent in free-living microorganisms. A large number of proteins involved in various biological processes were dynamically modified during the community succession, indicating that dynamic protein modification might play an important role in organismal response to changing environmental conditions. Furthermore, we found closely related, but ecologically differentiated bacteria harbored remarkably divergent PTM patterns between their orthologous proteins, implying that PTM divergence could be a molecular mechanism underlying their phenotypic diversities. We also quantified fractional occupancy for thousands of PTM events. The findings of this study should help unravel the role of PTMs in microbial adaptation, evolution and ecology.

February 14, 2014 - Celia E. Shiau: Probing fish-microbe interface for environmental assessment of clean energy

To preserve wildlife and natural resources for future generations, we face the grand challenge of effectively assessing and predicting the impact of current and future energy use. My overall goal is to probe the microbiome and host-microbe interface of fish populations, in order to evaluate environmental stress on aquatic life and resources. Current understanding of aquatic microbes in fresh and salt water is centered on free-living bacteria (independent of a host). I will discuss my work on the experimentally tractable fish model (Danio rerio) that can be applied to investigate the interaction between microbiota, host health, and environmental toxicants (such as mercury and other metalloids), and the aims of my Liane Russell fellowship research program. The findings will provide a framework for studies of other fish species, leveraging advanced imaging, metagenomics, bioinformatics, and neutron scattering. The proposed study promises to inform the potential use of fish microbes to solve energy and environmental challenges, thereby providing means for critical assessment of global energy impact.

February 6, 2014 - Susan Janiszewski: 3-connected, claw-free, generalized net-free graphs are hamiltonian

Given a family $\mathcal{F} = \{H_1, H_2, \dots, H_k\}$ of graphs, we say that a graph is $\mathcal{F}$-free if $G$ contains no subgraph isomorphic to any $H_i$, $i = 1,2,\dots, k$. The graphs in the set $\mathcal{F}$ are known as {\it forbidden subgraphs}. The main goal of this dissertation is to further classify pairs of forbidden subgraphs that imply a 3-connected graph is hamiltonian. First, the number of possible forbidden pairs is reduced by presenting families of graphs that are 3-connected and not hamiltonian. Of particular interest is the graph $K_{1,3}$, also known as the {\it claw}, as we show that it must be included in any forbidden pair. Secondly, we show that 3-connected, $\{K_{1,3}, N_{i,j,0}\}$-free graphs are hamiltonian for $i,j \ne 0, i+j \le 9$ and 3-connected, $\{K_{1,3}, N_{3,3,3}\}$-free graphs are hamiltonian, where $N_{i,j,k}$, known as the {\it generalized net}, is the graph obtained by rooting vertex-disjoint paths of length $i$, $j$, and $k$ at the vertices of a triangle. These results combined with previous known results give a complete classification of generalized nets such that claw-free, net-free implies a 3-connected graph is hamiltonian.

January 30, 2014 - Wei Guo: High order Semi-Lagrangian Methods for Transport Problems with Applications to Vlasov Simulations and Global Transport

Abstract and Bio sent on behalf of the speaker:
The semi-Lagrangian (SL) scheme for transport problems gains more and more popularity in the computational science community due to its attractive properties. For example, the SL scheme, compared with the Eulerian approach, allows extra large time step evolution by incorporating characteristics tracing mechanism, hence achieving great computational efficiency. In this talk, we will introduce a family of dimensional splitting high order SL methods coupled with high order finite difference weighted essentially non-oscillatory (WENO) procedures and finite element discontinuous Galerkin (DG) methods. By performing dimensional splitting, the multi-dimensional problem is decoupled into a sequence of 1-D problems, which are much easier to solve numerically in the SL setting. The proposed SL schemes are applied to the Vlasov model arising from the plasma physics and the global transport problems based on the cubed-sphere geometry from the operational climate model. We further introduce the integral defer correction (IDC) framework to reduce the dimensional splitting errors. The proposed algorithms have been extensively tested and benchmarked with classical problems in plasma physics such as Landau damping, two stream instability, Kelvin-Helmholtz instability and global transport problems on the cubed-sphere. This is joint work with Andrew Christlieb, Maureen Morton, Ram Nair and Jing-Mei Qiu.

January 28, 2014 - Jeff Haack: Applications of computational kinetic theory

Abstract and Bio sent on behalf of the speaker:
Kinetic theory describes the evolution of a complex system of a large number of interacting particles. These models are used to describe systems where the characteristic scales for interaction between particles and characteristic length scales are similar. In this talk, I will discuss numerical computation of several applications of kinetic theory, including rarefied gas dynamics with applications towards re-entry, kinetic models for plasmas, and a biological model for swarm behavior. As kinetic models often involve a high dimensional phase space as well as an integral operator modeling particle interactions, simulations have been impractical in many settings. However, recent advances in massively parallel computing are very well suited to solving kinetic models, and I will discuss how these resources are used in computing kinetic models and new difficulties that arise when computing on these architectures.

January 24, 2014 - Roman Lysecky: Data-driven Design Methods and Optimization for Adaptable High-Performance Systems

Abstract and Bio sent on behalf of the speaker:
Research has demonstrated that runtime optimization and adaptation methods can achieve performance improvement over design-time optimization system implementations. Furthermore, modern computing applications require a large degree of configurability and adaptability to operate on a variety of data inputs where the characteristic of the data inputs may change over time. In this talk, we highlight two runtime optimization methods for adaptable computing systems. We first highlight the use of runtime profiling and system-level performance and power estimation methods for estimating the speedup and power consumption of dynamically reconfigurable systems. We evaluate the accuracy and fidelity of the online estimation framework for dynamic configuration of computational kernels with goals of both maximizing performance and minimizing system power consumption. We further present an overview of the design framework and runtime reconfiguration methods supporting data-adaptable reconfigurable systems. Data-adaptable reconfigurable systems enable a flexible runtime implementation in which a system can transition the execution of tasks between different execution modalities, e.g., hardware and software implementations, while simultaneously continuing to process data during the transition.

Roman Lysecky is an Associate Professor of Electrical and Computer Engineering at the University of Arizona. He received his B.S., M.S., and Ph.D. in Computer Science from the University of California, Riverside in 1999, 2000, and 2005, respectively. His research interests focus on embedded systems, with emphasis on embedded system security, non-intrusive system observation methods for in-situ analysis of complex hardware and software behavior, runtime optimizations methods, and design methods for precisely timed systems with applications in safety-critical and mobile health systems. He was awarded the Outstanding Ph.D. Dissertation Award from the European Design and Automation Association (EDAA) in 2006 for New Directions in Embedded Systems. He received a CAREER award from the National Science Foundation in 2009 and four Best Paper Awards from the ACM/IEEE International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), the ACM/IEEE Design Automation and Test in Europe Conference (DATE), the IEEE International Conference on Engineering of Computer-Based Systems (ECBS), and the International Conference on Mobile Ubiquitous Computing, Systems, Services (UBICOMM). He has coauthored five textbooks on VHDL, Verilog, C, C++, and Java programming. He is an inventor on one US patent. In 2008 and 2013, he received an award for Excellence at the Student Interface from the College of Engineering and the University of Arizona.

January 21, 2014 - Tuoc Van Phan: Some Aspects in Nonlinear Partial Differential Equations and Nonlinear Dynamics

This talk contains two parts:

Part I: We discuss the Shigesada-Kawasaki-Teramoto system of cross-diffusion equations of two competing species in population dynamics. We show that if there are self-diffusion in one species and no cross-diffusion in the other, then the system has a unique smooth solution for all time in bounded domains of any dimension. We obtain this result by deriving global W ^(1,p) –estimates of Calderón-Zygmund type for a class of nonlinear reaction-diffusion equations with self-diffusion. These estimates are achieved by employing Caffarelli-Peral perturbation technique together with a new two-parameter scaling argument.

Part II: We study a class of nonlinear Schrödinger equations in one dimensional spatial space with double-well symmetric potential. We derive and justify a normal form reduction of the nonlinear Schrödinger equation for a general pitchfork bifurcation of the symmetric bound state. We prove persistence of normal form dynamics for both supercritical and subcritical pitchfork bifurcations in the time-dependent solutions of the nonlinear Schrödinger equation over long but finite time intervals.

The talk is based on my joint work with Luan Hoang (Texas Tech University), Truyen Nguyen (University of Akron), and Dmitry Pelinovsky (McMaster University).

January 17, 2014 - John Dolbow: Recent advances in embedded finite element methods

This seminar will present recent advances in an emerging class of embedded finite element methods for evolving interface problems in mechanics. By embedded, we refer to methods that allow for the interface geometry to be arbitrarily located with respect to the finite element mesh. This relaxation between mesh and geometry obviates the need for remeshing strategies in many cases and greatly facilitates adaptivity in others. The approach shares features with finite-difference methods for embedded boundaries, but within a variational setting that facilitates error and stability analysis.

We focus attention on a weighted form of Nitsche's method that allows interfacial conditions to be robustly enforced. Classically, Nitsche's method provides a means to weakly impose boundary conditions for Galerkin-based formulations. With regard to embedded interface problems, some care is needed to ensure that the method remains well behaved in varied settings ranging from interfacial configurations resulting in arbitrarily small elements to problems exhibiting large contrast. We illustrate how the weighting of the interfacial terms can be selected to both guarantee stability and to guard against ill-conditioning. Various benchmark problems for the method are then presented.

January 16, 2014 - Aziz Takhirov: Numerical analysis of the flows in Pebble Bed Geometries

Flows in complex geometries intermediate between free flows and porous media flows occur in pebble bed reactors and other industrial processes. The Brinkman models have consistently shown that for simplified settings accurate prediction of essential flow features depends on the impossible problem of meshing the pores. We discuss a new model to understand the flow and its properties in these geometries.

January 13, 2014 - Pablo Seleson: Bridging Scales in Materials with Mesoscopic Models

Complex systems are often characterized by processes occurring at different spatial and temporal scales. Accurate predictions of quantities of interest in such systems are many times only feasible through multiscale modeling. In this talk, I will discuss the use of mesoscopic models as a means to bridge disparate scales in materials. Examples of mesoscopic models include nonlocal continuum models, based on integro-differential equations, that generalize classical continuum models based on partial differential equations. Nonlocal models possess length scales, which can be controlled for multiscale modeling. I will present two nonlocal models: peridynamics and nonlocal diffusion, and demonstrate how inherent length scales in these models allow to bridge scales in materials.

January 9, 2014 - Gung-Min Gie: Motion of fluids in the presence of a boundary

In most practical applications of fluid mechanics, it is the interaction of the fluid with the boundary that is most critical to understanding the behavior of the fluid. Physically important parameters, such as the lift and drag of a wing, are determined by the sharp transition the air makes from being at rest on the wing to flowing freely around the airplane near the wing. Mathematically, the behavior of such flows at small viscosity is modeled by the Navier-Stokes equations. In this talk, we discuss some recent results on the boundary layers of the Navier-Stokes equations under various boundary conditions.

January 6, 2014 - Christine Klymko: Central and Communicability Measures in Complex Networks: Analysis and Algorithms

Complex systems are ubiquitous throughout the world, both in nature and within man-made structures. Over the past decade, large amounts of network data have become available and, correspondingly, the analysis of complex networks has become increasingly important. One of the fundamental questions in this analysis is to determine the most important elements in a given network. Measures of node importance are usually referred to as node centrality and measures of how well two nodes are able to communicate with each other are referred to as the communicability between pairs of nodes. Many measures of node centrality and communicability have been proposed over the years. Here, we focus on the analysis and computation of centrality and communicability measures based on matrix functions. First, we examine a node centrality measure based on the notion of total communicability, defined in terms of the row sums of the exponential of the adjacency matrix of the network. We argue that this is a natural metric for ranking nodes in a network, and we point out that it can be computed very rapidly even in the case of large networks. Furthermore, we propose a measure of the total network communicability, based on the total sum of node communicabilities, as a useful measure of the connectivity of the network as a whole. Next, we compare various parameterized centrality rankings based on the matrix exponential and matrix resolvent with degree and eigenvector centrality. The centrality measures we consider are exponential and resolvent subgraph centrality (defined in terms of the diagonal entries of the matrix exponential and matrix resolvent, respectively), total communicability, and Katz centrality (defined in terms of the row sums of the matrix resolvent). We demonstrate an analytical relationship between these rankings and the degree and subgraph centrality rankings which helps to explain explain the observed robustness of these rankings on many real world networks, even though the scores produced by the centrality measures are not stable.

December 19, 2013 - Adam Larios: New Techniques for Large-Scale Parallel Turbulence Simulations at High Reynolds Numbers

Abstract sent on behalf of the speaker:
Two techniques have recently been developed to handle large-scale simulations of turbulent flows. The first is a nonlinear, LES-type viscosity, which is based on the numerical violation of the local energy balance of the Navier-Stokes equations. This technique enjoys a numerical dissipation which remains vanishingly small in regions where the solution is smooth, only damping the flow in regions of numerical shock, allowing for increased accuracy at reduced computational cost. The second is a direction-splitting technique for projection methods, which unlocks new parallelism previously unexploited in fluid flows, and enables very fast, large-scale turbulence simulations.

December 16, 2013 - Tuoc Van Phan: Some Aspects in Nonlinear Partial Differential Equations and Nonlinear Dynamics

Abstract is attached and is sent on behalf of the speaker:
This talk contains two parts:

Part I: We discuss the Shigesada-Kawasaki-Teramoto system of cross-diffusion equations of two competing species in population dynamics. We show that if there are self-diffusion in one species and no cross-diffusion in the other, then the system has a unique smooth solution for all time in bounded domains of any dimension. We obtain this result by deriving global W ^(1,p) - estimates of Calderón-Zygmund type for a class of nonlinear reaction-diffusion equations with self-diffusion. These estimates are achieved by employing Caffarelli-Peral perturbation technique together with a new two-parameter scaling argument.

Part II: We study a class of nonlinear Schrödinger equations in one dimensional spatial space with double-well symmetric potential. We derive and justify a normal form reduction of the nonlinear Schrödinger equation for a general pitchfork bifurcation of the symmetric bound state. We prove persistence of normal form dynamics for both supercritical and subcritical pitchfork bifurcations in the time-dependent solutions of the nonlinear Schrödinger equation over long but finite time intervals.

The talk is based on my joint work with Luan Hoang (Texas Tech University), Truyen Nguyen (University of Akron), and Dmitry Pelinovsky (McMaster University).

December 13, 2013 - Rich Lehoucq: A Computational Spectral Graph Theory Tutorial

My presentation considers the research question of whether existing algorithms and software for the large-scale sparse eigenvalue problem can be applied to problems in spectral graph theory. I first provide an introduction to several problems involving spectral graph theory. I then provide a review of several different algorithms for the large-scale eigenvalue problem and briefly introduce the Anasazi package of eigensolvers.

December 10, 2013 - Jingwei Hu: Fast algorithms for quantum Boltzmann collision operators

The quantum Boltzmann equation describes the non-equilibrium dynamics of a quantum system consisting of bosons or fermions. The most prominent feature of the equation is a high-dimensional integral operator modeling particle collisions, whose nonlinear and nonlocal structure poses a great challenge for numerical simulation. I will introduce two fast algorithms for the quantum Boltzmann collision operator. The first one is a quadrature based solver specifically designed for the collision operator in reduced energy space. Compared to cubic complexity of direct evaluation, our algorithm runs in only linear complexity (optimal up to a logarithmic factor). The second one accelerates the computation of the full phase space collision operator. It is a spectral algorithm based on a special low-rank decomposition of the collision kernel. Numerical examples including an application to semiconductor device modeling are presented to illustrate the efficiency and accuracy of proposed algorithms.

December 6, 2013 - Jeongnim Kim: Analysis of QMC Applications on Petascale Computers

Continuum Quantum Monte Carlo (QMC) has proved to be an invaluable tool for predicting the properties of matter from fundamental principles. The multiple forms of parallelism afforded by QMC algorithms and high compute-to-communication ratio make them ideal candidates for acceleration in the multi/many-core paradigm, as demonstrated by the performance of QMCPACK on various high-performance computing (HPC) platforms including Titan (Cray XK7) and Mira (IBM BlueGene Q).

The changes expected on future architectures - orders of magnitude higher parallelism, hierarchical memory and communication, and heterogeneous nodes - pose great challenges to application developers but also present opportunities to transform them to tackle new classes of problems. This talk presents core QMC algorithms and their implementations in QMCPACK on the HPC systems of today. The speaker will discuss the performance of typical QMC workloads to elucidate the critical issues to be resolved for QMC to fully exploit increasing computing powers of forthcoming HPC systems.

December 3, 2013 - Terry Haut: Advances on an asymptotic parallel-in-time method for highly oscillatory PDEs

In this talk, I will first review a recent time-stepping algorithm for nonlinear PDEs that exhibit fast (highly oscillatory) time scales. PDEs of this form arise in many applications of interest, and in particular describe the dynamics of the ocean and atmosphere. The scheme combines asymptotic techniques (which are inexpensive but can have insufficient accuracy) with parallel-in-time methods (which, alone, can yield minimal speedup for equations that exhibit rapid temporal oscillations). Examples are presented on the (1D) rotating shallow water equations in a periodic domain, which demonstrate significant parallel speedup is achievable.

In order to implement this time-stepping method for general spatial domains (in 2D and 3D), a key component involves applying the exponential of skew-Hermitian operators. To this end, I will next present a new algorithm for doing so. This method can also be used for solving wave propagation problems, which is of independent interest. This scheme has several advantages over standard methods, including the absence of any stability constraints in relation to the spatial discretization, and the ability to parallelize the computation in the time variable over as many characteristic wavelengths as resources permit (in addition to any spatial parallelization). I will also present examples on the linear 2D shallow water equations, as well the 2D (variable coefficient) wave equation. In these examples, this method (in serial) is 1-2 orders of magnitude faster than both RK4 and the use of Chebyshev polynomials.

December 3, 2013 - Galen Shipman: The Compute and Data Environment for Science (CADES)

In this talk I will discuss ORNL's Compute and Data Environment for Science. The Compute and Data Environment for Science (CADES) provides R&D with a flexible and elastic compute and data infrastructure. The initial deployment consists of over 5 petabytes of high-performance storage, nearly half a petabyte of scalable NFS storage, and over 1000 compute cores integrated into a high performance ethernet and InfiniBand network. This infrastructure, based on OpenStack, provides a customizable compute and data environment for a variety of use cases including large-scale omics databases, data integration and analysis tools, data portals, and modeling/simulation frameworks. These services can be composed to provide end-to-end solutions for specific science domains.

Galen Shipman is the Data Systems Architect for the Computing and Computational Sciences Directorate and Director of the Compute and Data Environment for Science at Oak Ridge National Laboratory (ORNL). He is responsible for defining and maintaining an overarching strategy and infrastructure for data storage, data management, and data analysis spanning from research and development to integration, deployment and operations for high-performance and data-intensive computing initiatives at ORNL. His current work includes addressing many of the data challenges of major facilities such as those of the Spallation Neutron Source (Basic Energy Sciences) and major data centers focusing on Climate Science (Biological and Environmental Research).

December 2, 2013 - Wei Ding: Klonos: A Similarity Analysis-Based Tool for Software Porting in High-Performance Computing

Porting applications to a new system is a nontrivial job in the HPC field. It is a very time-consuming, labor-intensive process, and the quality of the results will depend critically on the experience of the experts involved. In order to ease the porting process, a methodology is proposed to address an important aspect of software porting that receives little attention, namely, planning support. When a scientific application consisting of many subroutines is to be ported, the selection of key subroutines greatly impacts the productivity and overall porting strategy, because these subroutines may represent a significant feature of the code in terms of functionality, code structure, or performance. They may also serve as indicators of the difficulty and amount of effort involved in porting a code to a new platform. The proposed methodology is based on the idea that a set of similar subroutines can be ported with similar strategies and result in a similar-quality porting. By vie wing subroutines as data and operator sequences, analogous to DNA sequences, various bio-informatics techniques may be used to conduct the similarity analysis of subroutines while avoiding NP-complete complexities of other approaches. Other code metrics and cost-model metrics have been adapted for similarity analysis to capture internal code characteristics. Based on those similarity analyses, "Klonos," a tool for software porting, has been created. Experiment shows that Klonos is very effective for providing a systematic porting plan to guide users during their porting process of reusing similar porting strategies for similar code regions.

November 20, 2013 - Chao Yang: Numerical Algorithms for Solving Nonlinear Eigenvalue Problems in Electronic Structure Calculation

The Kohn-Sham density functional theory (KSDFT) is the most widely used theory for studying electronic properties of molecules and solids. The main computational problem in KSDFT is a nonlinear eigenvalue problem in which the matrix Hamiltonian is a function of a number of eigenvectors associated with smallest eigenvalues. The problem can also be formulated as a constrained energy minimization problem or a nonlinear equation in which the unknown ground state electron density satisfies a fixed point map. Significant progress has been made in the last few years on understanding the mathematical properties of this class of problems. Efficient and reliable numerical algorithms have been developed to accelerate the convergence of nonlinear solvers. New methods have also been developed to reduce the computational cost in each step of the iterative solver. We will review some of these developments and discuss additional challenges in large-scale electronic structure calculations.

November 15, 2013 - Christian Straube: Simulation of HPDC Infrastructure Attributes

High Performance Distributed Computing (HPDC) infrastructures use several data centers, High Performance Computing (HPC) and distributed systems, each built from manifold (often heterogeneous) compute, storage, interconnect, and other specialized sub components to provide their capabilities, i.e. well-defined functionality that is exposed to a user or application. Capabilities' quality can be described by attributes, e.g., performance, energy efficiency, or reliability. Hardware-related modifications, such as clock rate adaptation or interconnect throughput improvement, often induce two groups of effects onto these attributes: the (by definition) positive intended effects and the mostly negative but unavoidable side effects. For instance, increasing a typical HPDC infrastructure's redundancy to address short-time breakdown and to improve reliability (positive intended effect), simultaneously increases energy consumption and degrades performance due to redundancy overhead (neg
ative side effects).

In this talk, I present Predictive Modification Effect Analysis (PMEA) that aims at avoiding harmful execution and costly but spare modification exploration by investigating in advance, whether the (negative) side effects on attributes will outweigh the (positive) intended effects. The talk covers the fundamental concepts and basic ideas of PMEA and it presents it's underlying model. The model is straightforward and fosters fast development, even for complex HPDC infrastructures, it handles individual and open sets of attributes and their calculations, and it addresses effect cascading through the entire HPC infrastructure. Additionally, I will present a prototype of a simulation tool and describe some selected features in detail.


Christian Straube is a Computer Science Ph.D. student at the Ludwig-Maximilians-University (LMU) in Munich, Germany since January 2012. His research interests include HPDC infrastructure and data center analysis, in particular planning, modification justification, as well as effect outweighing and cascading. During his time as Ph.D. student, he worked several months at the Leibniz Supercomputing Center, which operates the SuperMUC, a three Petaflop/s system that applies warm-water cooling. Prior to joining LMU as a Ph.D. student, Christian worked for several years in industry and academia as software engineer and project manager. He ran his own software engineering company for 10 years, and was (co-) founder of several IT related start-ups. He received a best paper award for a conference contribution to INFOCOMP 2012 and was subsequently invited as technical program member of INFOCOMP 2013. Christian holds a Diploma with Distinction in Computer Science from Ludwig-Maximilians-University in Munich with a minor in Medicine.

November 12, 2013 - Surya R. Kalidindi: Data Science and Cyberinfrastructure Enabled Development of Advanced Materials

Materials with enhanced performance characteristics have served as critical enablers for the successful development of advanced technologies throughout human history, and have contributed immensely to the prosperity and well-being of various nations. Although the core connections between the material's internal structure (i.e. microstructure), its evolution through various manufacturing processes, and its macroscale properties (or performance characteristics) in service are widely acknowledged to exist, establishing this fundamental knowledge base has proven effort-intensive, slow, and very expensive for a number of candidate material systems being explored for advanced technology applications. It is anticipated that the multi-functional performance characteristics of a material are likely to be controlled by a relatively small number of salient features in its microstructure. However, cost-effective validated protocols do not yet exist for fast identification of these salient features and establishment of the desired core knowledge needed for the accelerated design, manufacture and deployment of new materials in advanced technologies. The main impediment arises from lack of a broadly accepted framework for a rigorous quantification of the material's microstructure, and objective (automated) identification of the salient features in the microstructure that control the properties of interest.

Microstructure Informatics focuses on the development of data science algorithms and computationally efficient protocols capable of mining the essential linkages in large microstructure datasets (both experimental and modeling), and building robust knowledge systems that can be readily accessed, searched, and shared by the broader community. Given the nature of the challenges faced in the design and manufacture of new advanced, this new emerging interdisciplinary field is ideally positioned to produce a major transformation in the current practices used by materials scientists and engineers. The novel data science tools produced by this emerging field promise to significantly accelerate the design and development of new advanced materials through their increased efficacy in gleaning and blending the disparate knowledge and insights hidden in "big data" gathered from multiple sources (including both experiments and simulations). This presentation outlines specific strategies for data science enabled development of advanced materials, and illustrates key components of the proposed overall strategy with examples.

November 11, 2013 - Hermann Härtig: A fast and fault tolerant microkernel-based system for exa-scale computing (FFMK)

FFMK is a recently started project funded by DFG's Exascale-Software program. It addresses three key scalability obstacles expected in future exa-scale systems: the vulnerability to system failures due to transient or permanent failures, the performance losses due to imbalances and the noise due to unpredictable interactions between HPC applications and the operating system. To this end, we adapt and integrate well-proven technologies including:

FFMK will combine Linux running in a light-weight virtual machine with a special-purpose component for MPI, both running side by side on L4. The objective is to build a fluid self-organizing platform for applications that require scaling up to exa-scale performance. The talk will explain assumptions and overall architecture of FFMK and continue with presenting a number of design decisions the team is currently facing. FFMK is a cooperation between Hebrew University's MosiX team, the HPC centers of Berlin and Dresden (ZIB, ZIH) and TU Dresden's operating systems group.


After having received his PhD from Karlsruhe University on an SMP-related topic, Hermann Härtig led a team at German National Research Center(GMD) to build BirliX, a Unix lookalike designed to address high security requirements. He then moved to TU Dresden to lead the operating systems chair. His team was among the pioneers in building micro kernels of the L4 family (Fiasco, Nova) and systems based on L4 (LeRE, DROPS, NIZZA). L4RE and Fiasco form the OS basis of the SIMKO 3 smart phone. Hermann Härtig now is PI for FFMK.

October 17, 2013 - Marta D'Elia: Fractional differential operators on bounded domains as special cases of nonlocal diffusion operators

We analyze a nonlocal diffusion operator having as special cases the fractional Laplacian and fractional differential operators that arise in several applications, e.g. jump processes. In our analysis, a nonlocal vector calculus is exploited to define a weak formulation of the nonlocal problem. We demonstrate that the solution of the nonlocal equation converges to the solution of the fractional Laplacian equation on bounded domains as the nonlocal interactions become infinite. We also introduce Galerkin finite element discretizations of the nonlocal weak formulation and we derive a priori error estimates. Through several numerical examples we illustrate the theoretical results and we show that by solving the nonlocal problem it is possible to obtain accurate approximations of the solutions of fractional differential equations circumventing the problem of treating infinite-volume constraints.

October 15, 2013 - Tommy Janjusic: Framework for Evaluating Dynamic Memory Allocators including a new Equivalence Class based Cache-Conscious Dynamic Memory Allocator

Software applications' performance is hindered by a variety of factors, but most notably by the well-known CPU-Memory speed gap (often known as the memory wall). This results in the CPU sitting idle waiting for data to be brought from memory to processor caches. The addressing used by caches causes non-uniform accesses to various cache sets. The non-uniformity is due to several reasons; including how different objects are accessed by the code and how the data objects are located in memory. Memory allocators determine where dynamically created objects are placed, thus defining addresses and their mapping to cache locations. It is important to evaluate how different allocators behave with respect to the localities of the created objects. Most allocators use a single attribute, the size, of an object in making allocation decisions. Additional attributes such as the placement with respect to other objects, or specific cache area may lead to better use of cache memories. This talk discusses a framework that allows for the development and evaluation of new memory allocation techniques. At the root of the framework is a memory tracing tool called Gleipnir, which provides very detailed information about every memory access, and relates it back to source level objects. Using the traces from Gleipnir, we extended a commonly used cache simulator for generating detailed cache statistics: per function, per data object, per cache line, and identify specific data objects that are conflicting with each other. The utility of the framework is demonstrated with a new memory allocator known as an equivalence class allocator. The new allocator allows users to specify cache sets, in addition to object size, where the objects should be placed. We compare this new allocator with two well-known allocators, viz., Doug\_Lea and Pool allocators.

October 8, 2013 - Sophie Blondel: NAT++: An analysis software for the NEMO experiment

The NEMO 3 detector aims to prove that the neutrino is a Majorana particle (i.e. identical to the antineutrino). It is mainly composed of a calorimeter and a wire chamber, the former measuring the time and energy of a particle, and the latter reconstructing its track. NEMO 3 has taken data for 5 effective years with an event trigger rate of ~5 Hz, resulting in a total of 10e8 events to analyze. A C++-based software, called NAT++, was created to calibrate and analyze these events. The analysis is mainly based on a time of flight calculation which will be the focus of this presentation. Supplementing this classic analysis, a new tool named gamma-tracking has been developed in order to improve the reconstruction of the gamma energy deposits in the detector. The addition of this tool in the analysis pipeline leads to an increase of 30% of statistics in certain desired channels.

September 30, 2013 - Eric Barton: Fast Forward Storage and Input/Output (I/O)

Conflicting pressures drive the requirements for I/O and Storage at Exascale. On the one hand, an explosion is anticipated, not only in the size of scientific data models but also in their complexity and in the volume of their attendant metadata. These models require workflows that integrate analysis and visualization and new object-oriented I/O Application Programming Interfaces (APIs) to make application development tractable and allow compute to be moved to the data or data to the compute as appropriate. On the other hand, economic realities driving the architecture and reliability of the underlying hardware will push the limits on horizontal scale, introduce unavoidable jitter and make failure the norm. The I/O system will have to handle these as transparently as possible while providing efficient, sustained and predictable performance. This talk will describe the research underway in the Department of Energy (DOE) Fast Forward Project to prototype a complete Exascale I/O stack including at the top level, an object-oriented I/O API based on HDF5, in the middle, a Burst Buffer and data layout optimizer based on PLFS (A Checkpoint Filesystem for Parallel Applications) and at the bottom, DAOs (Data Access Objects) - transactional object storage based on Lustre.

September 25, 2013 - James Beyer: OPENMP vs OPENACC

A brief introduction to two accelerator programming directive sets with a common heritage: OpenACC 2.0 and OpenMP 4.0. After introducing the two directive sets, a side by side comparison of available features along with code examples will be presented to help developers understand their options as they begin programming for both Nvidia and Intel accelerated machines.

September 25, 2013 - Michael Wolfe: OPENACC 2.X AND BEYOND

The OpenACC API is designed to support high-level, performance portable, programming across a range of host+accelerator target systems. This presentation will start with a short discussion of that range, which provides a context for the features and limitations of the specification. Some important additions that were included in OpenACC 2.0 will be highlighted. New features currently under discussion for future versions of the OpenACC API and a summary of the expected timeline will be presented.

September 23, 2013 - Jun Jia: Accelerating time integration using spectral deferred correction

In this talk, we illustrate how to use the spectral deferred correction (SDC) to improve the time integration for scientific simulations. The SDC method combines a Picard integral formulation of the error equation, spectral integration and a user chosen low-order time marching method to form stable methods with arbitrarily high formal order of accuracy in time. The method could be either explicit or implicit, and it also provides the ability to adopt operator splitting while maintaining high formal order. At the end of the talk, we will show some applications using this technique.

September 19, 2013 - Kenny Gross: Energy Aware Data Center (EADC) Innovations: Save Energy, Boost Performance

The global electricity consumption for enterprise and high-performance computing data centers continues to grow much faster than Moore's Law as data centers push into emerging markets, and as developed countries see explosive growth in computing demand as well as supraexponential growth in demand for exabyte (and now zettabyte) storage systems. The USDOE reported that data centers now consume 38 gigawatts of electricity worldwide, a number that is growing exponentially even during times of global economic slowdowns. Oracle has developed a suite of novel algorithmic innovations that can be applied nonintrusively to any IT servers and substantially reduces the energy usage and thermal dissipation for the IT assets (saving additional energy for the data center HVAC systems), while significantly boosting performance (and hence Return-On-Assets) for the IT assets, thereby avoiding additional server purchases (that would consume more energy). The key enabler for this suite of algorithmic innovations is Oracle's Intelligent Power Monitoring (IPM) telemetry harness (implemented in hardware mods anywhere in the data center). IPM, when coupled with advanced pattern recognition, identifies and quantifies three significant nonlinear (heretofore 'invisible') energy-wastage mechanisms that are present in all enterprise and HPC computing assets today, including in low-PUE high-efficiency data centers: 1) leakage power in the CPUs (grows exponentially with CPU temperature), 2) aggregate fan-motor power inside the servers (grows with the cubic power of fan RPMs), and 3) substantial degradation of server energy efficiency by low-level ambient vibrations in the data center racks. This presentation shows how continuous system internal telemetry coupled with advanced pattern recognition technology that was developed for nuclear reactor applications by the presenter and his team back at Argonne National Lab in the 1990s are significantly cutting energy utilization while boosting performance for enterprise and HPC computing assets.

Speaker Bio Info:
Kenny Gross is a Distinguished Engineer for Oracle and team leader for the System Dynamics Characterization and Control team in Oracle's Physical Sciences Research Center in San Diego. Kenny specializes in advanced pattern recognition, continuous system telemetry, and dynamic system characterization for improving the reliability, availability, and energy efficiency of enterprise computing systems and for the datacenters in which the systems are deployed. Kenny has 220 US patents issued and others pending, 180 scientific publications, and was awarded a 1998 R&D 100 Award for one of the top 100 technological innovations of that year, for an advanced statistical pattern recognition technique that was originally developed for nuclear plant applications and is now being used for a variety of applications to improve the quality-of-service, availability, and optimal energy efficiency for enterprise and HPC computer servers. Kenny earned his Ph.D. in nuclear engineering from the U. of Cincinnati in 1977.

September 17, 2013 - Damien Lebrun-Grandie: Simulation of thermo-mechanical contact between fuel pellets and cladding in UO2 nuclear fuel rods

As fission process heats up the fuel rods, UO2 pellets stacked on top of each other swell both radially and axially, while the surrounding Zircaloy cladding creeps down, so that cladding and pellet eventually come into contact. This exacerbate chemical degradation of the protective cladding and stresses may enable rapid propagation of cracks and thus threaten integrity of the clad. Along these lines, pellet-cladding interaction establish itself as a major concern in fuel rod design and reactor core operation in light water reactors. Accurately modeling fuel behavior is challenging because the mechanical contact problem strongly depends on temperature distribution, and the coupled pellet-cladding heat transfer problem, in turn, is affected by changes in geometry induced by bodies deformations and stresses generated at contact interface.

Our work focuses on active set strategies to determine the actual contact area in high-fidelity coupled physics fuel performance codes. The approach consists of two steps: In the first one, we determine the boundary region on conventional finite element meshes where the contact conditions shall be enforced to prevent objects from occupying the same space. For this purpose, we developed and implemented an efficient parallel search algorithm for detecting mesh inter-penetration and vertex/mesh overlap. The second step deals with solving the mechanical equilibrium factoring the contact conditions computed in the first step. To do so, we developed a modified version of the multi-point constraint (MPC) strategy. While the original algorithm was restricted to the Jacobi preconditioned conjugate gradient method, our MPC algorithm works with any other Krylov solvers (and thus liberate us from the symmetry requirements). Furthermore it does not place any restriction on the preconditioner used.

The multibody thermo-mechanical contact problem is tackled using modern numerics, with higher-order finite elements and a Newton-based monolithic strategy to handle both nonlinearities (coming from the non-linearity of the contact condition but as well as from the temperature-dependence of the fuel thermal conductivity for instance) and coupling between the various physics components (gap conductance sensitive to the clad- pellet distance, thermal expansion coefficient or Youngs modulus affected by temperature changes, etc.).

We will provide different numerical examples for one and multiple bodies contact problems to demonstrate how the method performs.

September 5, 2013 - Jared Saia: How to Build a Reliable System Out of Unreliable Components

The first part of this talk will survey several decades of work on designing distributed algorithms that boost reliability. These algorithms boost reliability in the sense that they enable the creation of a reliable system from unreliable components. We will discuss practical successes of these algorithms, along with drawbacks. A key drawback is scalability: significant redundancy of resources is required in order to tolerate even one node fault. The second part of the talk will introduce a new class of distributed algorithms for boosting reliability. These algorithms are self-healing in the sense that they dynamically adapt to failures, requiring additional resources only when faults occur.

We will discuss two such self-healing algorithms. The first enables self-healing in an overlay network, even when an omniscient adversary repeatedly removes carefully chosen nodes. Specifically, the algorithm ensures that the shortest path between any pair of nodes never increases by more than a logarithmic factor, and that the degree of any node never increases by more than a factor of 3. The second algorithm enables self-healing with Byzantine faults, where an adversary can control t < n/8 of the n total nodes in the network. This algorithm enables point-to-point communication with an expected number of message corruptions that is O(t(log* n)^2). Empirical results show that this algorithm reduces bandwidth and computation costs by up to a factor of 70 when compared to previous work.

August 21, 2013 - Hank Childs: Hybrid Parallelism for Visualization and Analysis

Many of today's parallel visualization and analysis programs are designed for distributed-memory parallelism, but not for the shared-memory parallelism available on GPUs or multi-core CPUs. However, architectural trends on supercomputers increasingly contain more and more cores per node, whether through the presence of GPUs or through more cores per CPU node. To make the best use of such hardware, we must evaluate the benefits of hybrid parallelism - parallelism that blends distributed- and shared-memory approaches - for visualization and analysis's data-intensive workloads. With this talk, Hank explores the fundamental challenges and opportunities for hybrid parallelism with visualization and analysis, and discusses recent results that measure its benefit.

Speaker Bio:
Hank Childs is an assistant professor at the University of Oregon and a computer systems engineer at Lawrence Berkeley National Laboratory. His research focuses on scientific visualization, high-performance computing, and the intersection of the two. He received the Department of Energy Career award in 2012 to research explorative visualization use cases on exascale machines. Additionally, Hank is one of the founding members of the team that developed the VisIt visualization and analysis software. He received his Ph.D. from UC Davis in 2006.


August 13, 2013 - Rodney O. Fox: Quadrature-Based Moment Methods for Kinetics-Based Flow Models

Kinetic theory is a useful theoretical framework for developing multiphase flow models that account for complex physics (e.g., particle trajectory crossings, particle size distributions, etc.) (1). For most applications, direct solution of the kinetic equation is intractable due to the high-dimensionality of the phase space. Thus a key challenge is to reduce the dimensionality of the problem without losing the underlying physics. At the same time, the reduced description must be numerically tractable and possess the favorable attributes of the original kinetic equation (e.g. hyperbolic, conservation of mass/momentum, etc.)

Starting from the seminal work of McGraw (2) on the quadrature method of moments (QMOM), we have developed a general closure approximation referred to as quadrature-based moment methods (3; 4; 5). The basic idea behind these methods is to use the local (in space and time) values of the moments to reconstruct a well-defined local distribution function (i.e. non-negative, compact support, etc.). The reconstructed distribution function is then used to close the moment transport equations (e.g. spatial fluxes, nonlinear source terms, etc.).

In this seminar, I will present the underlying theoretical and numerical issues associated with quadrature-based reconstructions. The transport of moments in real space, and its numerical representation in terms of fluxes, plays a critical role in determining whether a moment set is realizable. Using selected examples, I will introduce recent work on realizable high-order flux reconstructions developed specifically for finite-volume schemes (6).

[1] MARCHISIO, D. L. & FOX, R. O. 2013 Computational Models for Polydisperse Particulate and Multiphase Systems, Cambridge University Press.
[2] MCGRAW, R. 1997 Description of aerosol dynamics by the quadrature method of moments. Aerosol Science and Technology 27, 255–265.
[3] DESJARDINS, O., FOX, R. O. & VILLEDIEU, P. 2008 A quadrature-based moment method for dilute fluid-particle flows. Journal of Computational Physics 227, 2514–2539.
[4] YUAN, C. & FOX, R. O. 2011 Conditional quadrature method of moments for kinetic equations. Journal of Computational Physics 230, 8216–8246.
[5] YUAN, C., LAURENT, F. & FOX, R. O. 2012 An extended quadrature method of moments for population balance equations. Journal of Aerosol Science 51, 1–23.
[6] VIKAS, V., WANG, Z. J., PASSALACQUA, A. & FOX, R. O. 2011 Realizable high-order finite-volume schemes for quadrature-based moment methods. Journal of Computational Physics 230, 5328–5352.

August 12, 2013 - Lucy Nowell: ASCR: Funding/ Data/ Computer Science

Dr. Lucy Nowell is a Computer Scientist and Program Manager for the Advanced Scientific Computing Research (ASCR) program office in the Department of Energy's (DOE) Office of Science. While her primary focus is on scientific data management, analysis and visualization, her portfolio spans the spectrum of ASCR computer science interests, including supercomputer architecture, programming models, operating and runtime systems, and file systems and input/output research. Before moving to DOE in 2009, Dr. Nowell was a Chief Scientist in the Information Analytics Group at Pacific Northwest National Laboratory (PNNL). On detail from PNNL, she held a two-year assignment as a Program Director for the National Science Foundation's Office of Cyberinfrastructure, where her program responsibilities included Sustainable Digital Data Preservation and Access Network Partners (DataNet), Community-based Data Interoperability Networks (INTEROP), Software Development for Cyberinfrastructure (SDCI) and Strategic Technologies for Cyberinfrastructure (STCI). At PNNL, her research centered on applying her knowledge of visual design, perceptual psychology, human-computer interaction, and information storage and retrieval to problems of understanding and navigating in very large information spaces, including digital libraries. She holds several patents in information visualization technologies.

Dr. Nowell joined PNNL in August 1998 after a career as a professor at Lynchburg College in Virginia, where she taught a wide variety of courses in Computer Science and Theatre. She also headed the Theatre program and later chaired the Computer Science Department. While pursuing her Master of Science and Doctor of Philosophy degrees in Computer Science at Virginia, she worked as a Research Scientist in the Digital Libraries Research Laboratory and also interned with the Information Access team at IBM's T. J. Watson Research Laboratories in Hawthorne, NY. She also has a Master of Fine Arts degree in Drama from the University of New Orleans and the Master of Arts and Bachelor of Arts degrees in Theatre from the University of Alabama

August 8, 2013 - Carlos Maltzahn: Programmable Storage Systems

With the advent of open source parallel file systems a new usage pattern emerges: users isolate subsystems of parallel file systems and put them in contexts not foreseen by the original designers, e.g., an object-based storage back end gets a new REST-ful front end to become Amazon Web Service's S3 compliant key value store, or a data placement function becomes a placement function for customer accounts. This trend shows a desire for the ability to use existing file system services and compose them to implement new services. We call this ability "programmable storage systems".

In this talk I will argue that by designing programmability into storage systems has the following benefits: (1) we are achieving greater separation of storage performance engineering from storage reliability engineering, making it possible to optimize storage systems in a wide variety of ways without risking years of investments into code hardening; (2) we are creating an environment that encourages people to create a new stack of storage systems abstractions, both domain-specific and across domains, including sophisticated optimizers that rely on machine learning techniques; (3) we inform commercial parallel file system vendors on the design of low-level APIs for their products so that they match the versatility of open source storage systems without having to release their entire code into open source; and (4) use this historical opportunity to leverage the tension between the versatility of open source storage systems and the reliability of proprietary systems to lead the community of storage system designers.

I will illustrate programmable storage with an overview of programming abstractions that we have found useful so far, and if time permits, talk about "scriptable storage systems" and the interesting new possibilities of truly data-centered software engineering it enables.

Bio: Carlos Maltzahn is an Associate Adjunct Professor at the Computer Science Department of the Jack Baskin School of Engineering, Director of the UCSC Systems Research Lab and Director of the UCSC/Los Alamos Institute for Scalable Scientific Data Management at the University of California at Santa Cruz. Carlos Maltzahn's current research interests include scalable file system data and metadata management, storage QoS, data management games, network intermediaries, information retrieval, and cooperation dynamics.

Carlos Maltzahn joined UC Santa Cruz in December 2004 after five years at Network Appliance. He received his Ph.D. in Computer Science from the University of Colorado at Boulder in 1999, his M.S. in Computer Science in 1997, and his Univ. Diplom Informatik from the University of Passau, Germany in 1991.

August 7, 2013 - Tiffany M. Mintz: Toward Abstracting the Communication Intent in Applications to Improve Portability and Productivity

Programming with communication libraries such as the Message Passing Interface (MPI) obscures the high-level intent of the communication in an application and makes static communication analysis difficult to do. Compilers are unaware of communication libraries' specifics, leading to the exclusion of communication patterns from any automated analysis and optimizations. To overcome this, communication patterns can be expressed at higher-levels of abstraction and incrementally added to existing MPI applications. In this paper, we propose the use of directives to clearly express the communication intent of an application in a way that is not specific to a given communication library. Our communication directives allow programmers to express communication among processes in a portable way, giving hints to the compiler on regions of computations that can be overlapped with communication and relaxing communication constraints on the ordering, completion and synchronization of the communication imposed by specific libraries such as MPI. The directives can then be translated by the compiler into message passing calls that efficiently implement the intended pattern and be targeted to multiple communication libraries. Thus far, we have used the directives to express point-to-point communication patterns in C, C++ and Fortran applications, and have translated them to MPI and SHMEM.

August 2, 2013 - Alberto Salvadori: Multi-scale and multi-physics modeling of Li-ion batteries: a computational homogenization approach

There is being great interest in developing next generation of lithium ion battery for higher capacity and longer life of cycling, in order to develop significantly more demanding energy storage requirements for humanity existing and future inventories of power-generation and energy-management systems. Industry and academic are looking for alternative materials and Si is one of the most promising candidates for the active material, because it has the highest theoretical specific energy capacity. It emerged that very large mechanical stresses associated with huge volume changes during Li intercalation/deintercalation are responsible for poor cyclic behaviors and quick fading of electrical performance. The present contribution aims at providing scientific contributions in this vibrant context.

The computational homogenization scheme is here tailored to model the coupling between electrochemistry and mechanical phenomena that coexist during batteries charging and discharging cycles. At the macro-scale, di.fl'usion-advection equations model the electro-chemistry of the whole cell, whereas the micro-scale models the multi-component porous electrode, dififusion and intercalation of Lithium in the active particles, the swelling and fracturing of the latter. The scale transitions are formulated by tailoring the well established first-order computational homogenization scheme for mechanical and thermal problems.

August 2, 2013 - Michela Taufer: The effectiveness of application-aware self-management for scientific discovery in volunteer computing systems

There is being great interest in developing next generation of lithium ion battery for higher capacity and longer life of cycling, in order to develop significantly more demanding energy storage requirements for humanity existing and future inventories of power-generation and energy-management systems. Industry and academic are looking for alternative materials and Si is one of the most promising candidates for the active material, because it has the highest theoretical specific energy capacity. It emerged that very large mechanical stresses associated with huge volume changes during Li intercalation/deintercalation are responsible for poor cyclic behaviors and quick fading of electrical performance. The present contribution aims at providing scientific contributions in this vibrant context.

July 24, 2013 - Catalin Trenchea: Improving time-stepping numerics for weakly dissipative systems

In this talk I will address the stability and accuracy of CNLF time-stepping scheme, and propose a modification of Robert-Asselin time-filters for numerical models of weakly diffusive evolution systems. This is motivated by the vast number of applications, e.g., the meteorological equations, and coupled systems with dominating skew symmetric coupling (ground-water surface-water).

In contemporary numerical simulations of the atmosphere, evidence suggests that time-stepping errors may be a significant component of total model error, on both weather and climate time-scales. After a brief review, I will suggest a simple but effective method for substantially improving the time-stepping numerics at no extra computational expense.

The most common time-stepping method is the leapfrog scheme combined with the Robert-Asselin (RA) filter. This method is used in many atmospheric models: ECHAM, MAECHAM, MM5, CAM, MESO-NH, HIRLAM, KMCM, LIMA, SPEEDY, IGCM, PUMA, COSMO, FSU-GSM, FSU-NRSM, NCEP-GFS, NCEP-RSM, NSEAM, NOGAPS, RAMS, and CCSR/NIES-AGCM. Although the RA filter controls the time-splitting instability in these models (successfully suppresses the spurious computational mode associated with the leapfrog time stepping scheme), it also weakly suppresses the physical mode, introduces non-physical damping, and reduces the accuracy.

This presentation proposes a simple modification to the RA filter (mRA) [Y. Li, CT 2013].

The modification is analyzed and compared with the RAW filter (Williams 2009, 2011).

The mRA increases the numerical accuracy to O(Δt^4) amplitude error and at least O(Δt2) phase-speed error for the physical mode. The mRA filter requires the same storage factors as RAW, and one more than the RA filter does. When used in conjunction with the leapfrog scheme, the RAW filter eliminates the non-physical damping and increases the amplitude accuracy by two orders, yielding third-order accuracy, the phase accuracy remaining second-order. The mRA and RAW filters can easily be incorporated into existing models, typically via the insertion of just a single line of code. Better simulations are obtained at no extra computational expense.


June 28, 2013 - Yuri Melnikov: A surprising connection between Green's functions and the infinite product representation of elementary functions

Some standard as well as innovative approaches will be reviewed for the construction of Green's functions for the elliptic PDEs. Based on that, a surprising technique is proposed for obtaining infinite product representations of some trigonometric, hyperbolic, and special functions. The technique uses comparison of different alternative expressions of Green's functions constructed by different methods. This allows us not only obtain the classical Euler's formulas but also come up with a number of new representations.

June 27, 2013 - Kimmy Mu: Performance, accuracy and power tradeoff for scientific processes using workflow in high performance computing

Power is getting more important in high performance computing than ever before as we are on the way to exascale computing. The transition from old style which considers performance and accuracy to the new style which will take care of performance, accuracy and power is necessary. In high performance computing a workflow is composed of a large number of tasks, such as simulation, analysis and visualization. However, there is no such guidance for user getting to know which kind of task allocation and task placement to nodes and clusters are good for performance or power with accuracy requirement. In this presentation, I will talk about power optimization for reconfigurable embedded systems which dynamically choose kernels to run on hardware co-processors to response to dynamic application behavior at runtime. With a lot of commonalities as in HPC, we are going to explore the method in high performance computing for a dynamic workflow of task placement, etc., in terms of performance, power and accuracy constraints.

June 26, 2013 - Matthew Causley: A fast implicit Maxwell field solver for plasma simulations

We present a conservative spectral scheme for Boltzmann collision operators. This formulation is derived from the weak form of the Boltzmann equation, which can represent the collisional term as a weighted convolution in Fourier space. The weights contain all of the information of the collision mechanics and can be precomputed. I will present some results for isotropic (in angle) interations, such as hard spheres and Maxwell molecules. We have recently extended the method to take into account anisotropic scattering mechanisms arising from potential interactions between particles, and we use this method to compute the Boltzmann equation with screened Coulomb potentials. In particular, we study the rate of convergence of the Fourier transform for the Boltzmann collision operator in the grazing collisions limit to the Fourier transform for the limiting Landau collision operator. We show that the decay rate to equilibrium depends on the parameters associated with the collision cross section, and specifically study the differences between the classical Rutherford scattering angular cross section, which has logarithmic error, and an artificial one with a linear error. I will also present recent work extending this method for multispecies gases and gas with internal degrees of freedom, which introduces new challenges for conservation and introduces inelastic collisions to the system.

June 25, 2013 - Jeff Haack: Conservative Spectral Method for Solving the Boltzmann Equation

We present a conservative spectral scheme for Boltzmann collision operators. This formulation is derived from the weak form of the Boltzmann equation, which can represent the collisional term as a weighted convolution in Fourier space. The weights contain all of the information of the collision mechanics and can be precomputed. I will present some results for isotropic (in angle) interations, such as hard spheres and Maxwell molecules. We have recently extended the method to take into account anisotropic scattering mechanisms arising from potential interactions between particles, and we use this method to compute the Boltzmann equation with screened Coulomb potentials. In particular, we study the rate of convergence of the Fourier transform for the Boltzmann collision operator in the grazing collisions limit to the Fourier transform for the limiting Landau collision operator. We show that the decay rate to equilibrium depends on the parameters associated with the collision cross section, and specifically study the differences between the classical Rutherford scattering angular cross section, which has logarithmic error, and an artificial one with a linear error. I will also present recent work extending this method for multispecies gases and gas with internal degrees of freedom, which introduces new challenges for conservation and introduces inelastic collisions to the system.

June 17, 2013 - Megan Cason: Analytic Utility Of Novel Threading Models In Distributed Graph Algorithms

Current analytic methods for judging distributed algorithms rely on communication abstractions that characterize performance assuming purely passive data movement and access. This assumption complicates the analysis of certain algorithms, such as graph analytics, which have behavior that is very dependent on data movement and modifying shared variables. This presentation will discuss an alternative model for analyzing theoretic scalability of distributed algorithms written with the possibility of active data movement and access. The mobile subjective model presented here confines all communication to 1) shared memory access and 2) executing thread state which can be relocated between processes, i.e., thread migration. Doing so enables a new type of scalability analysis, which calculates the number of thread relocations required, and whether that communication is balanced across all processes in the system. This analysis also includes a model for contended shared data accesses, which is used to identify serialization points in an algorithm. This presentation will show the analysis for a common distributed graph algorithm, and illustrate how this model could be applied to a real world distributed runtime software stack.

June 14, 2013 - Jeff Carver: Applying Software Engineering Principles to Computational Science

The increase in the importance of Computational Science software motivates the need to identify and understand which software engineering (SE) practices are appropriate. Because of the uniqueness of the computational science domain, exiting SE tools and techniques developed for the business/IT community are often not efficient or effective. Appropriate SE solutions must account for the salient characteristics of the computational science development environment. To identify these solutions, members of the SE community must interact with members of the computational science community. This presentation will discuss the findings from a series of case studies of CSE projects and the results of an ongoing workshop series. First, a series of case studies of computational science projects were conducted as part of the DARPA High Productivity Computing Systems (HPCS) project. The main goal of these studies was to understand how SE principles were and were not being applied in computational science along with some of the reasons why. The studies resulted in nine lessons learned about computational science software that are important to consider moving forward. Second, the Software Engineering for Computational Science and Engineering workshop brings together software engineers and computational scientists. The outcomes of this workshop series provide interesting insight into potential future trends.

June 12, 2013 - Hans-Werner van Wyk: Multilevel Quadrature Methods

Stochastic Sampling methods are arguably the most direct and least intrusive means of incorporating parametric uncertainty into numerical simulations of partial differential equations with random inputs. However, to achieve an overall error that is within a desired tolerance, a large number of sample simulations may be required (to control the sampling error), each of which may need to be run at high levels of spatial fidelity (to control the spatial error). Multilevel methods aim to achieve the same accuracy as traditional sampling methods, but at a reduced computational cost, through the use of a hierarchy of spatial discretization models. Multilevel algorithms coordinate the number of samples needed at each discretization level by minimizing the computational cost, subject to a given error tolerance. They can be applied to a variety of sampling schemes, exploit nesting when available, can be implemented in parallel and can be used to inform adaptive spatial refinement strategies. We present an introduction to multilevel quadrature in the context of stochastic collocation methods, and demonstrate its effectiveness theoretically and by means of numerical examples.

June 7, 2013 - Xuechen Zhang: Scibox: Cloud Facility for Sharing On-Line Data

Collaborative science demands global sharing of scientific data but it cannot leverage universally accessible cloud-based infrastructures, like DropBox, as those offer limited interfaces and inadequate levels of access bandwidth. In this talk, I will present Scibox cloud facility for online sharing scientific data. It uses standard cloud storage solutions, but offers a usage model in which high end codes can write/read data to/from the cloud via the same ADIOS APIs they already use for their I/O actions, thereby naturally coupling data generation with subsequent data analytics. Extending current ADIOS IO methods, with Scibox, data upload/download volumes are controlled via Data Reduction (DR) functions stated by end users and applied at the data source, before data is moved, with further gains in efficiency obtained by combining DR-functions to move exactly what is needed by current data consumers.

June 6, 2013 - Yuan Tian: Taming Scientific Big Data with Flexible Organizations for Exascale Computing

The fast growing High Performance Computing systems enable scientists to simulate scientific processes with great complexities and consequently, often producing complex data that are also exponentially increasing in size. However, the growth within the computing infrastructure is significantly imbalanced. The dramatically increasing computing power is accompanied with a slowly improving storage system. Such discordant progress among computing power, storage, and data, has led to a severe Input/Output (I/O) bottleneck that requires novel techniques to address big data challenges in the scientific domain.

This talk will identify the prevalent characteristics of scientific data and storage system as a whole, and explore opportunities to drive I/O performance for petascale computing and prepare it for the exascale. To this end, a set of flexible data organization and management techniques are introduced and evaluated to address the aforementioned concerns. Four key techniques are designed to exploit the capability of the back-end storage system for processing and storing scientific big data with a fast and scalable I/O performance, visualization space filling curve-based data reorganization, system-aware chunking, spatial and temporal aggregation, and in-node staging with compression. The experimental results demonstrated more than 60x speedup for a mission critical climate application during data post-processing.

May 31, 2013 - Pablo Seleson: Multiscale Material Modeling with Peridynamics

Multiscale modeling has been recognized in recent years as an important research field to achieve feasible and accurate predictions of complex systems. Peridynamics, a nonlocal reformulation of continuum mechanics based on integral equations, is able to resolve microscale phenomena at the continuum level. As a nonlocal model, peridynamics possesses a length scale which can be controlled for multiscale modeling. For instance, classical elasticity has been presented as a limiting case of a peridynamic model. In this talk, I will introduce the peridynamics theory and show analytical and numerical connections of peridynamics to molecular dynamics and classical elasticity. I will also present multiscale methods to concurrently couple peridynamics and classical elasticity, demonstrating the capabilities of peridynamics towards multiscale material modeling.

Dr. Seleson is a Postdoctoral Fellow in the Institute for Computational Engineering and Sciences at The University of Texas at Austin. He has obtained his Ph.D. in Computational Science from Florida State University in 2010. He holds a M.S. degree in Physics from the Hebrew University of Jerusalem (2006), and a double B.S. degree in Physics and Philosophy also from the Hebrew University of Jerusalem (2002).

May 29, 2013 - Ryan McMahan: The Effects of System Fidelity for Virtual Reality Applications

Virtual reality (VR) has developed from Ivan Sutherland's inception of an "ultimate display" to a realized field of advanced technologies. Despite evidence supporting the use of VR for various benefits, the level of system fidelity required for such benefits is often unknown. Modern VR systems range from high-fidelity simulators that incorporate many technologies to lower-fidelity, desktop-based virtual environments. In order to identify the level of system fidelity required for certain beneficial uses, research has been conducted to better understand the effects of system fidelity on the user. In this talk, a series of experiments evaluating the effects of interaction fidelity and display fidelity will be presented. Future directions of system fidelity research will also be discussed.

Dr. Ryan P. McMahan is an Assistant Professor of Computer Science at the University of Texas at Dallas, where his research focuses on the effects of system fidelity for virtual reality (VR) applications. Using an immersive VR system comprised of a wireless head-mounted display (HMD), a real-time motion tracking system, and Wii Remotes as 3D input devices, his research determines the effects of system fidelity by varying components such as stereoscopy, field of view, and degrees of freedom for interactions. Currently, he is using this methodology to investigate the effects of fidelity on learning for VR training applications. Dr. McMahan received his Ph.D. in Computer Science in 2011 from Virginia Tech, where he also received his B.S. and M.S. in Computer Science in 2004 and 2007.


May 28, 2013 - Adrian Sandu: Data Assimilation and the Adaptive Solution of Inverse Problems

The task of providing an optimal analysis of the state of the atmosphere requires the development of novel computational tools that facilitate an efficient integration of observational data into models. In this talk, we will introduce variational and statistical estimation approaches to data assimilation. We will discuss important computational aspects including the construction of efficient models for background errors, the construction and analysis of discrete adjoint models, new approaches to estimate the information content of observations, and hybrid variational-ensemble approaches to assimilation. We will also present some recent results on the solution of inverse problems using space and time adaptivity, and a priori and a posteriori error estimates for the optimal solution.

May 24, 2013 - Satoshi Matsuoka: The Futures of Tsubame Supercomputer and the Japanese HPCI Towards Exascale

HPCI is the Japanese High Performance Computer Infrastructure, which encompasses the national operations of major supercomputers, such as the K supercomputer and Tsubame2.0, much like the XSEDE in the United States and PRACE in Europe. Recently it was announced that the Japanese Ministry of Education, Culture, Sports, Science and Technology is intending to initiate a project towards an exascale supercomputer to be deployed around 2020. However, the workshop report that recommend the project also calls out for a comprehensive infrastructure where a flagship machine will be supplemented with leadership machines to complement the abilities of the flagship. Although it is still early, I will attempt to discuss the current status of Tsubame2.0 evolution to 2.5 and 3.0 in this context, as well as the activities in Japan to initiate an exascale effort, with collaborative elements with the US Department of Energy partners in system software development.

May 17, 2013 - Jon Mietling and Tony McCrary: Bling3D: a new game development toolset from l33t Labs

Bling3D is a forthcoming game development toolset from l33t labs.

The fusion of Eclipse 4 with game development technologies, Bling allows both programmers and designers to create compelling interactive experiences from within one powerful tool.

In this talk, you will be introduced to some of Bling's exciting features, including:

Jon Mietling and Tony McCrary are representatives of l33t labs LLC, technology startup from the Detroit, Michigan region.

May 10, 2013 - Xiao Chen: A Modular Uncertainty Quantification Framework for Multi-physics Systems

This talk presents a modular uncertainty quantification (UQ) methodology for multi-physics applications in which each physics module can be independently embedded with its internal UQ method (intrusive or non-intrusive). This methodology offers the advantage of "plug-and-play" flexibility (i.e., UQ enhancements to one module do not require updates to the other modules) without losing the "global" uncertainty propagation property. (This means that, by performing UQ in this modular manner, all inter-module uncertainty and sensitivity information is preserved.) In addition, using this methodology one can also track the evolution of global uncertainties and sensitivities at the grid point level, which may be useful for model improvement. We demonstrate the utility of such a framework for error management and Bayesian inference on a practical application involving a multi-species flow and reactive transport in randomly heterogeneous porous media.

May 2, 2013 - Kenley Pelzer: Quantum Biology: Elucidating Design Principles from Photosynthesis

Recent experiments suggest that quantum mechanical effects may play a role in the efficiency of photosynthetic light harvesting. However, much controversy exists about the interpretation of these experiments, in which light harvesting complexes are excited by a fem to second laser pulse. The coherence in such laser pulses raises the important question of whether these quantum mechanical effects are significant in biological systems excited by incoherent light from the sun. In our work, we apply frequency-domain Green's function analysis to model a light-harvesting complex excited by incoherent light. By modeling incoherent excitation, we demonstrate that the evidence of long-lived quantum mechanical effects is not purely an artifact of peculiarities of the spectroscopy. This data provides a new perspective on the role of noisy biological environments in promoting or destroying quantum transport in photosynthesis.

April 23, 2013 - Kirk W. Cameron: Power-Performance Modeling, Analyses and Challenges

The power consumption of supercomputers ultimately limits their performance. The current challenge is not whether we will can build an exaflop system by 2018, but whether we can do it in less than 20 megawatts. The SCAPE Laboratory at Virginia Tech has been studying the tradeoffs between performance and power for over a decade. We've developed an extensive tool chain for monitoring and managing power and performance in supercomputers. We will discuss our power-performance modeling efforts and the implications of our findings for exascale systems as well as some research directions ripe for innovation.

April 23, 2013 - Jordan Deyton: Tor Bridge Distribution Powered by Threshold RSA

Since its inception, Tor has offered anonymity for internet users around the world. Tor now offers bridges to help users evade internet censorship, but the primary distribution schemes that provide bridges to users in need have come under attack. This talk explores how threshold RSA can help strengthen Tor's infrastructure while also enabling more powerful bridge distribution schemes. We implement a basic threshold RSA signature system for the bridge authority and a reputation-based social network design for bridge distribution. Experimental results are obtained showing the possibility of quick responses to requests from honest users while maintaining both the secrecy and the anonymity of registered clients and bridges.

April 19, 2013 - Maria Avramova and Kostadin Ivanov: OECD LWR UAM and PSBT/BFBT benchmarks and their relation to Advanced LWR Simulations

From 1987 to 1995, Nuclear Power Engineering Corporation (NUPEC) in Japan performed a series of void measurement tests using full-size mock-up tests for both BWRs and PWRs. Void fraction measurements and departure from nucleate boiling (DNB) tests were performed at NUPEC under steady-state and transient conditions. The workshop will provide overview of the OECD/NEA/NRC PWR Subchannel and Bundle Tests (PSBT) and OECD/NEA/NRC BWR Full-size Fine-mesh Bundle Tests (BFBT) benchmarks based on the NUPEC data. The benchmarks were designed to provide a data set for evaluation of the abilities of existing subchannel, system, and computational fluid dynamics (CFD) thermal-hydraulics codes to predict void distribution and departure from nucleate boiling (DNB) in LWRs under steady-state and transient conditions. The first part of the seminar summarizes the description of PSBT and BFBT benchmark databases, specifications, definition of benchmark exercises and comparative analysis of obtained results and makes the case on how these benchmarks can be used for verification, validation and uncertainty quantification of thermal-hydraulic tools developed for advanced LWR simulations.

The second part of the seminar will provide overview of the OECD/NEA benchmark for LWR Uncertainty Analysis in Modeling (UAM) with emphasis on the Exercises of Phase I and Phase II of the benchmark and discussion of the Phase III, which is directly related to coupled multi-physics advanced LWR simulations. Series of well-defined problems with complete sets of input specifications and reference experimental data will be introduced with an objective is to determine the uncertainty in LWR calculations at all stages of coupled reactor physics/thermal hydraulics calculation. The full chain of uncertainty propagation will be discussed starting form from basic data and engineering uncertainties, across different scales (multi-scale), and physics phenomena (multi-physics) as well as how this propagation is tested on a number of benchmark exercises. Input, output and assumptions for each Exercise will be given as well as the procedures to calculate the output and propagated uncertainties in each step will be described supplemented by results of benchmark participants.

Bio of Dr. Maria Avramova
Dr. Maria Avramova is an Assistant Professor in the Mechanical and Nuclear Engineering Department at the Pennsylvania State University. She is currently the Director of Reactor Dynamics and Fuel Management Group (RDFMG). Her expertise and experience is in the area of developing methods and computer codes for multi-dimensional reactor core analysis. Her background includes development, verification, and validation of thermal-hydraulics sub-channel, porous media, and CFD models and codes for reactor core design, transient, and safety computational analysis. She has led and coordinated the OECD/NRC BFBT and PSBT benchmarks and currently is coordinating Phase II of the OECD LWR UAM benchmark. Her latest research efforts have been focused on high-fidelity multi-physics simulations (involving coupling of reactor physics, thermal-hydraulics and fuel performance models) as well as on uncertainty and sensitivity analysis of reactor design and safety calculations. Dr. Avramova has published over 15 refereed journal papers and over 40 refereed conference proceedings articles.

Bio of Dr. Kostadin Ivanov
Dr. Kostadin Ivanov is Distinguished Professor in the Mechanical and Nuclear Engineering Department at the Pennsylvania State University. He is currently Graduate Coordinator of Nuclear Engineering Program. His research developments include computational methods, numerical algorithms and iterative techniques, nuclear fuel management and reloading optimization techniques, reactor kinetics and core dynamics methods, cross-section generation and modeling algorithms for multi-dimensional steady-state and transient reactor calculations, and coupling three-dimensional (3-D) kinetics models with thermal-hydraulic codes. He has also led the development of multi-dimensional neutronics, in-core fuel management and coupled 3-D kinetics/thermal-hydraulic computer code benchmarks, multi-dimensional reactor transient and safety analysis methodologies as well as integrated analysis of safety-related parameters, system transient modeling of power plants, and in-core fuel management analyses.
Examples of such benchmarks are OECD/NRC PWR MSLB benchmark, OECD/NRC BWR TT benchmark and OECD/DOE/CEA VVER-1000 CT benchmark. He is currently a chair and coordinator of the Scientific Board and Technical Program Committee of OECD LWR UAM benchmark.

April 18, 2013 - Sparsh Mittal: MASTER: A Technique for Improving Energy Efficiency of Caches in Multicore Processors

Large power consumption of modern processors has been identified as the most severe constraint in scaling their performance. Further, in recent CMOS technology generations, leakage energy has been dramatically increasing and hence, the leakage energy consumption of large last-level caches (LLCs) has become a significant source of the processor power consumption.

This talk first highlights the need of power management in LLCs in the modern multi-core processors and then presents MASTER, a micro-architectural cache leakage energy saving technique using dynamic cache reconfiguration. MASTER uses dynamic profiling of LLCs to predict energy consumption of running programs at multiple LLC sizes. Using these estimates, suitable cache quotas are allocated to different programs using cache-coloring scheme and the unused LLC space is turned off to save energy. The implementation overhead of MASTER is small and even for 4 core systems; its overhead is only 0.8% of L2 cache size. Simulations have been performed using an out-of-order x86-64 simulator and 2-core and 4-core multi-programmed workloads from SPEC2006 suite. Further, MASTER has been compared with two energy saving techniques, namely decay cache and way-adaptable cache. The results show that MASTER gives the highest saving in energy and does not harm performance or cause unfairness.

Finally, this talk briefly shows an extension of MASTER for multicore QoS systems. Simulation results confirm that a large amount of energy is saved while meeting the QoS requirement of most of the workloads.

April 17, 2013 - Okwan Kwon: Automatic Scaling of OpenMP Applications Beyond Shared Memory

We present the first fully automated compiler-runtime system that successfully translates and executes OpenMP shared-address-space programs on laboratory-size clusters, for the complete set of regular, repetitive applications in the NAS Parallel Benchmarks. We introduce a hybrid compiler-runtime translation scheme. This scheme features a novel runtime data flow analysis and compiler techniques for improving data affinity and reducing communication costs. We present and discuss the performance of our translated programs, and compare them with the performance of the MPI, HPF and UPC versions of the benchmarks. The results show that our translated programs achieve 75% of the hand-coded MPI programs, on average.

April 17, 2013 - Michael S. Murillo: Molecular Dynamics Simulations of Charged Particle Transport in High Energy-Density Matter

High energy-density matter is now routinely produced at large laser facilities. Producing fusion energy at such facilities challenges our ability to model collisional plasma processes that transport energy among the plasma species and across spatial scales. While the most accurate computational method for describing collisional processes is molecular dynamics, there are numerous challenges associated with using molecular dynamics to model very hot plasmas. However, recent advances in high performance computing have allowed us to develop methods for simulating a wide variety of processes in hot, dense plasmas. I will review these developments and describe our recent results that involve simulating fast particle stopping in dense plasmas. Using the simulation results, implications for theoretical modeling of charged-particle stopping will be given.

April 12, 2013 - Vivek K. Pallipuram: Exploring Multiple Levels Of Performance Modeling For Heterogeneous Systems

One of the major challenges faced by the High-Performance Computing (HPC) community today is user-friendly and accurate heterogeneous performance modeling. Although performance prediction models exist to fine-tune applications, they are seldom easy-to-use and do not address multiple levels of design space abstraction. Our research aims to bridge the gap between reliable performance model selection and user-friendly analysis. We propose a straightforward and accurate multi-level performance modeling suite for multi-GPGPU systems that addresses multiple levels of design space abstraction. The multi-level performance modeling suite primarily targets synchronous iterative algorithms (SIAs) using our synchronous iterative GPGPU execution (SIGE) model and addresses two levels of design space abstraction: 1) low-level where partial details of the implementation are present along with system specifications and 2) high-level where implementation details are minimum and only high-level system specifications are known. The low-level abstraction of the modeling suite employs statistical techniques for runtime prediction, whereas the high-level abstraction utilizes existing analytical and quantitative modeling tools to predict the application runtime. Our initial validation efforts for the low-level abstraction yield high runtime prediction accuracy with less than 10% error rate for several tested GPGPU cluster configurations and case studies. The development of high-level abstraction models is underway. The end goal of our research is to offer the scientific community, a reliable and user-friendly performance prediction framework that allows them to optimally select a performance prediction strategy for the given design goals and system architecture characteristics.

April 11, 2013 - Jeff Young: Commodity Global Address Spaces - How Can We Scale Out Accelerator and Memory Performance for Tomorrow's Clusters?

Current Top 500 systems like Titan, Stampede, and Tianhe-1A have started to embrace the use of off-chip accelerators, such as GPUs and x86 coprocessors, to dramatically improve their overall performance and efficiency numbers. At the same time, these systems also make very specific assumptions about the availability of highly optimized interconnects and software stacks that are used to mitigate the effects of running large applications across multiple nodes and their accelerators. This talk focuses on the gap in networking between high-performance computing clusters and data centers and proposes that future clusters should be built around commodity-based networks and managed global address spaces to improve the performance of data movement between host memory and accelerator memory. This thesis is supported by previous research into converged commodity interconnects and ongoing research on the Oncilla managed GAS runtime to support aggregated memory for data warehousing applications. In addition, we will speculate on how commodity-based networks and memory management for clusters of accelerators might be affected by the advent of 3D stacking and fused CPU/GPU architectures.

April 9, 2013 - Cong Liu: Towards Efficient Real-Time Multicore Computing Systems

Current trends in multicore computing are towards building more powerful, intelligent, yet space- and power-efficient systems. A key requirement in correctly building such intelligent systems is to ensure real-time performance, i.e., "make the right move at the right time in a predictable manner." Current research on real-time multicore computing has been limited to simple systems for which complex application runtime behaviors are ignored; this limits the practical applicability of such research. In practice, complex but realistic application runtime behaviors often exist, such as I/O operations, data communications, parallel execution segments, critical sections etc. Such runtime behaviors are currently dealt with by over-provisioning systems, which is an economically wasteful practice. I will present predictable real-time multicore computing system design, analysis, and implementation methods that can efficiently support common types of application runtime behaviors. I will show that the proposed methods are able to avoid over-provisioning systems and to reduce the number of needed hardware components to the extent possible while providing timing correctness guarantees.

In the second part of the talk, I will present energy-efficient workload mapping techniques for heterogeneous multicore CPU/GPU systems. Through both algorithmic analysis and prototype system implementation, I will show that the proposed techniques are able to achieve better energy efficiency while guaranteeing response time performance.

April 9, 2013 - Frank Mueller: On Determining a Viable Path to Resilience at Exascale

Exascale computing is projected to feature billion core parallelism. At such large processor counts, faults will become more common place. Current techniques to tolerate faults focus on reactive schemes for recovery and generally rely on a simple checkpoint/restart mechanism. Yet, they have a number of shortcomings. (1) They do not scale and require complete job restarts. (2) Projections indicate that the mean-time-between-failures is approaching the overhead required for checkpointing. (3) Existing approaches are application-centric, which increases the burden on application programmers and reduces portability.

To address these problems, we discuss a number of techniques and their level of maturity (or lack thereof) to address these problems. These include (a) scalable network overlays, (b) on-the-fly process recovery, (c) proactive process-level fault tolerance, (d) redundant execution, (e) the effort of SDCs on IEEE floating point arithmetic and (f) resilience modeling. In combination, these methods are aimed to pave the path to exascale computing.

April 5, 2013 - Sarat Sreepathi: Optimus: A Parallel Metaheuristic Optimization Framework With Environmental Engineering Applications

Optimus (Optimization Methods for Universal Simulators) is a parallel optimization framework for coupling computational intelligence methods with a target scientific application. Optimus includes a parallel middleware component, PRIME (Parallel Reconfigurable Iterative Middleware Engine) for scalable deployment on emergent supercomputing architectures. PRIME provides a lightweight communication layer to facilitate periodic inter-optimizer data exchanges. A parallel search method, COMSO (Cooperative Multi-Swarm Optimization) was designed and tested on various high dimensional mathematical benchmark problems. Additionally, this work presents a novel technique, TAPSO (Topology Aware Particle Swarm Optimization) for network based optimization problems. Empirical studies demonstrate that TAPSO achieves better convergence than standard PSO for Water Distribution Systems (WDS) applications. Scalability analysis of Optimus was performed on the Cray XK6 supercomputer (Jaguar) at Oak Ridge Leadership Computing Facility for the leak detection problem in WDS. For a weak scaling scenario, we achieved 84.82% of baseline at 200,000 cores relative to performance at 1000 cores.

March 20, 2013 - J.W. Banks: Stable Partitioned Solvers for Compresible Uid-structure Interaction Problems

In this talk, we discuss recent work concerning the developing and analysis of stable, partitioned solvers for uid-structure interaction problems. In a partitioned approach, the solvers for each uid or solid domain are isolated from each other and coupled only through the interface. This is in contrast to fully-coupled monolithic schemes where the entire system is advanced by a single unied solver, typically by an implicit method. Added-mass instabilities, common to partitioned schemes, are addressed through the use of a newly developed interface projection technique. The overall approach is based on imposing the exact solution to local uid-solid Riemann problems directly in the numerical method. Stability of the FSI coupling is discussed using normal-mode stability theory, and the new scheme is shown to be stable for a wide range of material parameters. For the rigid body case, the approach is shown to be stable even for bodies of no mass or rotational inertia. This dicult limiting case exposes interesting subtleties concerning the notion of added mass in uid-structure problems at the continuous level.

March 13, 2013 - Travis Thompson: Navier-Stokes equations to Describe the Motion of Fluid Substances

The Navier-Stokes equations describe the motion of fluid substances; the equations are widely utilized to model many physical phenomena such as weather patterns, ocean currents, turbulent fluid flow and magneto-hydrodynamics. Despite their wide utilization a comprehensive theoretical understanding remains an open question; the equations offer a venue for challenges at the forefront of both theoretical and computational knowledge. My work at Texas A&M has focused, primarily, on two topics: aspects of hyperbolic conservation laws, specifically mass conservation for incompressible Navier-Stokes, and computational investigation of an LES model based on a new eddy-viscosity; both embody appeal to highly-parallel scientific computing albeit in differing ways.

With respect to hyperbolic conservation laws: on the computational side I have implemented a one-step artificial compression term in a numerical code which counteracts an entropy-viscosity regularization term. This is an innovative approach; canonical methods for interface tracking are two-step or adaptive procedures. In addition the implementation utilizes a splitting approach, originally designed for use in a highly-parallel momentum equation variant, as an approximation operator in the time-stepping scheme; this approach imbues the algorithm with additional parallelism. On the theoretical side a distinct approach towards the analysis of dispersion error, utilizing a commutator expression, has been investigated for particular finite element spaces; the approach offers a computational segue into investigating consistency error and moves away from the canonical, tedious, expansion-based methodology of analysis.

With respect to large eddy simulations (LES): Computational investigations of an eddy-viscosity model based on the entropy-viscosity of Guermond & Popov has been underway for the last six months; in collaboration with Dr. Larios, a post-doc here at Texas A&M, an analysis of the qualitative and statistical attributes of high Reynolds number, turbulent flow is being conducted. We will compare our results to the Smagorinsky-Lilly turbulence model and attempt to verify basic tenets of isotropic turbulence theory; namely the Kolmogorov – 5/3 law and predictions regarding the uncorrelated nature of velocity structure functions.

March 1, 2013 - Bob Salko: Development, Improvement, and Validation of Reactor Thermal-Hydraulic Analysis Tools

As a result of the need for continual development, qualification, and application of computational tools relating to the modeling of nuclear systems, the Reactor Dynamics and Fuel Management Group (RDFMG) at the Pennsylvania State University has maintained an active involvement in this area. This presentation will highlight recent RDFMG work relating to thermal-hydraulic modeling tools. One such tool is the COolant Boiling in Rod Arrays - Two Fluids (COBRA-TF) computer code, capable of modeling the independent behavior of continuous liquid, vapor, and droplets using the sub-channel methodology. Work has been done to expand the modeling capabilities from the in-vessel region only, which COBRA-TF has been developed for, to the coolant-line region by developing a dedicated coolant-line-analysis package that serves as an add-on to COBRA-TF. Additional COBRA-TF work includes development of a pre-processing tool for faster, more user-friendly creation of COBRA-TF input decks, implementation of post-processing capabilities for visualization of simulation results, and optimization of the source code for significant improvements in simulation speed and memory management. Of equal importance to these development activities is the validation of the resulting tools for their intended applications. The code capability to capture rod-bundle thermal-hydraulic behavior during prototypical PWR operating conditions will be demonstrated through comparison of predicted and experimental results for the New Experimental Studies of Thermal-Hydraulics of Rod Bundles (NESTOR) tests. Due to the growing usage of Computational Fluids Dynamics (CFD) tools in this area, modeling results predicted by the STAR-CCM+ CFD tool will also be presented for these tests.

February 23, 2013 - Thomas L. Lewis: Finite Difference and Discontinuous Galerkin Numerical Methods for Fully Nonlinear Second Order PDEs with Applications to Stochastic Optimal Control

In this talk I will discuss a convergence framework for directly approximating the viscosity solutions of fully nonlinear second order PDE problems. The main focus will be the introduction of a set of sufficient conditions for constructing convergent finite difference (FD) methods. The conditions given are meant to be easier to realize and implement than those found in the current literature. The given FD methodology will then be shown to generalize to a class of discontinuous Galerkin (DG) methods. The proposed DG methods are high order and allow for increased flexibility when choosing a computational mesh. Numerical experiments will be presented to gauge the performance of the proposed DG methods. An overview of the PDE theory of viscosity solutions will also be given. The presented ideas are part of a larger project concerned with efficiently and accurately approximating the Hamilton-Jacobi-Bellman equation from stochastic optimal control.

February 22, 2013 - Charles K. Garrett: Numerical Integration of Matrix Riccati Differential Equations with Solution Singularities

A matrix Riccati differential equation (MRDE) is a quadratic ODE of the form

X' = A21 + A22X – XA11 – XA12X.

It is well known that MRDEs may have singularities in their solution. In this presentation, both the theory and practice of numerically integrating MRDEs past solution singularities will be analyzed. In particular, it will be shown how to create a black box numerical MRDE solver, which accurately solves an MRDE with or without singularities.

February 21, 2013 - Giacomo Dimarco: Asymptotic Preserving Implicit-Explicit Runge-Kutta Methods For Non-Linear Kinetic Equations

In this talk, we will discuss Implicit-Explicit (IMEX) Runge Kutta methods which are particularly adapted to stiff kinetic equations of Boltzmann type. We will consider both the case of easy invertible collision operators and the challenging case of Boltzmann collision operators. We give sufficient conditions in order that such methods are asymptotic preserving and asymptotically accurate. Their monotonicity properties are also studied. In the case of the Boltzmann operator the methods are based on the introduction of a penalization technique for the collision integral. This reformulation of the collision operator permits to construct penalized IMEX schemes which work uniformly for a wide range of relaxation times avoiding the expensive implicit resolution of the collision operator. Finally we show some numerical results which confirm the theoretical analysis.

February 20, 2013 - Tom Berlijn: Effects of Disorder on the Electronic Structure of Functional Materials

Doping is one of the most powerful ways to tune the properties of functional materials such as thermoelectrics, photovoltaics and superconductors. Besides carriers and chemical pressure, the dopants insert disorder into the materials. In this talk I will present two case studies of doped Fe based superconductors: Fe vacancies in KxFeySe2 [1] and Ru substitutions in Ba(Fe1-xRux)2As2 [2]. With the use of a recently developed first principles method [3], non-trivial disorder effects are found that are not only interesting scientifically, but also have potential implications for materials technology. Open questions for further research will be discussed.

[1] TB, P.j. Hirschfeld, W. Ku, PRL 109 (2012)
[2] L. Wang, TB, C.-H. Lin, Y. Wang, P.j. Hirschfeld, W. Ku, PRL 110 (2013)
[3] TB, D. Volja, W. Ku, PRL 106 (2011)

February 19, 2013 - Joshua D. Carmichael: Seismic Monitoring of the Western Greenland Ice Sheet: Response to Early Lake Drainage

In 2006, the drainage of a supraglacial lake through hydrofracture on the Greenland Ice-sheet was directly observed for the first time. This event demonstrated that surface-to-bed hydrological connections can be established through 1km of cold ice and thereby allow surficial forcing of a developed subglacial drainage system by surface meltwater. In a climate changing scenario, supraglacial lakes on the Western Greenland Ice Sheet are expected to drain earlier each summer and form new lakes at higher elevations. The ice sheet response to these earlier drainages in the near future is of glaciological concern. We address the response of the Western Greenland Ice Sheet to an observed early lake drainage using a synthesis of seismic and GPS monitoring near an actively draining lake. This experiment demonstrates that (1) seismic activity precedes the drainage event by several days and is likely coincident with crack coalescence, that (2) seismic multiplet locations are coincident with the uplift of the ice during drainage and (3) a diurnal seismic response of the ice sheet follows after the ice surface settles to pre-drainage elevation a week later. These observations are consistent with a model in which the subglacial drainage system is likely distributed, highly pressurized and with low hydraulic conductivity at drainage initiation. It also demonstrates that an early lake drainage likely reduces basal normal stress for order-week time scales by storing water subglacially. We conclude with recommendations for future long-range lake drainage detection.

February 18, 2013 - Mili Shah: Calculating a Symmetry Preserving Singular Value Decomposition

The symmetry preserving singular value decomposition (SPSVD) produces the best symmetric (low rank) approximation to a set of data. These symmetric approximations are characterized via an invariance under the action of a symmetry group on the set of data. The symmetry groups of interest consist of all the non-spherical symmetry groups in three dimensions. This set includes the rotational, reflectional, dihedral, and inversion symmetry groups. In order to calculate the best symmetric (low rank) approximation, the symmetry of the data set must be determined. Therefore, matrix representations for each of the non-spherical symmetry groups have been formulated. These new matrix representations lead directly to a novel reweighting iterative method to determine the symmetry of a given data set by solving a series of minimization problems. Once the symmetry of the data set is found, the best symmetric (low rank) approximation can be established by using the SPSVD. Applications of the SPSVD to protein dynamics problems as well as facial recognition will be presented.

February 14, 2013 - Zheng (Cynthia) Gu: Efficient and Robust Message Passing Schemes for Remote Direct Memory Access (RDMA)-Enabled Clusters

While significant effort has been made in improving Message Passing Interface (MPI) performance, existing work has mainly focused on eliminating software overhead in the library and delivering raw network performance to applications. The current MPI implementations such as MPICH2, MVAPICH2, and Open MPI still suffer from performance issues such as unnecessary synchronizations, communication progress problems, and lack of communication-computation overlaps. The root cause of these problems is the mis-match between the communication protocols/algorithms and the communication scenarios. In my PhD research, I will develop efficient and robust message passing schemes for both point-to-point and collective communications for RDMA-enabled clusters. Unlike existing approaches for optimizing MPI performance, our approach will allow different communication protocols/algorithms for different communication scenarios. The idea is to use the most appropriate communication scheme for each communication so as to remove the mis-matches, which will eliminate unnecessary synchronizations, improve communication progress, and maximize communication-computation overlaps during a communication operation. This prospectus will describe the background of this research, present our preliminary research, and summarize the proposed future work.

February 8, 2013 - Taylor Patterson: Simulation of Complex Nonlinear Elastic Bodies Using Lattice Deformers

Lattice deformers are a popular option in computer graphics for modeling the behavior of elastic bodies as they avoid the need for conforming mesh generation, and their regular structure offers significant opportunities for performance optimizations. This talk will present work that expands the scope of current grid-based elastic deformers, adding support for a number of important simulation features. The approach to be described accommodates complex nonlinear, optionally anisotropic materials while using an economical one-point quadrature scheme. The formulation fully accommodates near-incompressibility by enforcing accurate nonlinear constraints, supports implicit integration for large time steps, and is not susceptible to locking or poor conditioning of the discrete equations. Additionally, this technique increases the solver accuracy by employing a novel high-order quadrature scheme on lattice cells overlapping with the embedded model boundary, which are treated at sub-cell precision. This accurate boundary treatment can be implemented at a minimal computational premium over the cost of a voxel-accurate discretization. Finally, this talk will present part of the expanding feature set of this approach that is currently under development.

February 6, 2013 - Makhan Virdi: Modeling High-resolution Soil Moisture to Estimate Recharge Timing and Experiences with Geospatial Analyses

Estimating the time of groundwater recharge after a rainfall event is poorly understood because of it's dependence on non-linear soil characteristics and variability in antecedent soil conditions. Movement of water in variably saturated soil can be described by Richards' equation - a non-linear partial differential equation without a closed-form analytical solution, which is difficult to approximate. To develop a simple recharge model using minimum number of soil parameters, high resolution soil moisture data from a soil column in controlled laboratory conditions were analysed to understand the wetting front propagation at a finer temporal scale. Findings from a series of simulations using an existing Finite Element model by varying soil properties and depth to water table were used to propose a simple model that uses only the most significant representative soil properties and antecedent soil matrix state. In other separate geospatial analyses, satellite imagery was used for determining landslide risk cost to develop an algorithm for safest and shortest route planning in hilly areas susceptible to landslide; effects of decadal climate extremes was studied on lake-groundwater exchanges; Effects of Phosphate mining on a regional scale were studied using hydrological models and geospatial analysis LiDAR derived DEM and watershed.

February 5, 2013 - Roshan J. Vengazhiyil and C. F. Jeff Wu: Experimental Design, Model Calibration, and Uncertainty Quantification

We will start the talk with a newly developed space-filling design, called minimum energy design (MED). The key ideas involved in constructing the MED are the visualization of each design point as a charged particle inside a box, and minimization of the total potential energy of these particles. It is shown through theoretical arguments and simulations, that under regularity conditions and proper choice of the charge function, the MED can asymptotically generate any arbitrary probability density function. This new design technique has important applications in Bayesian computation and uncertainty quantification. The second part of the talk will focus on model calibration. The commonly used Kennedy and O'Hagan's (KO) approach treats the computer model as a black box and therefore, the statistically calibrated models lack physical interpretability. We propose a new framework that opens up the black box and introduces statistical models inside the computer model. This approach leads to simpler models that are physically more interpretable. Then, we will present some theoretical results concerning the convergence properties of calibration parameter estimation in the KO formulation of the model calibration problem. The KO calibration is shown to be asymptotically inconsistent. A new approach, called L2 distance calibration, is shown to be consistent and asymptotically efficient in estimating the calibration parameters.

February 4, 2013 - Li-Shi Luo: Kinetic Methods for CFD

Computational fluid dynamics (CFD) is based on direct discretizations of the Navier-Stokes equations. The traditional approach of CFD is now being challenged as new multi-scale and multi-physics problems have begun to emerge in many fields -- in nanoscale systems, the scale separation assumption does not hold; macroscopic theory is therefore inadequate, yet microscopic theory may be impractical because it requires computational capabilities far beyond our present reach. Methods based on mesoscopic theories, which connect the microscopic and macroscopic descriptions of the dynamics, provide a promising approach. Besides their connection to microscopic physics, kinetic methods also have certain numerical advantages due to the linearity of the advection term in the Boltzmann equation. Dr. Luo will discuss two mesoscopic methods: the lattice Boltzmann equation and the gas-kinetic scheme, their mathematical theory and their applications to simulate various complex flows. Examples include incompressible homogeneous isotropic turbulence, hypersonic flows, and micro-flows.

January 23, 2013 - Tarek Ali El Moselhy: New Tools for Uncertainty Quantification and Data Assimilation in Complex Systems

In this talk, Dr. Tarek Ali El Moselhy will present new tools for forward and inverse uncertainty quantification (UQ) and data assimilation.

In the context of forward UQ, Dr. Moselhy will briefly summarize a new scalable algorithm particularly suited for very high-dimensional stochastic elliptic and parabolic PDEs. The algorithm relies on computing a compact separated representation of the stochastic field of interest. The separated presentation is computed iteratively and adaptively via a greedy optimization algorithm. The algorithm has been successfully applied to problems of flow and transport in stochastic porous media, handling “real world” levels of spatial complexity and providing orders of magnitude reduction in computational time compared to state of the art methods.

In the context of inverse UQ, Dr. Moselhy will present a new algorithm for the Bayesian solution of inverse problems. The algorithm explores the posterior distribution by finding a {\it transport map} from a reference measure to the posterior measure, and therefore does not require any Markov chain Monte Carlo sampling. The map from the reference to the posterior is approximated using polynomial chaos expansion and is computed via stochastic optimization. Existence and uniqueness of the map are guaranteed by results from the optimal transport literature. The map approach is demonstrated on a variety of problems, ranging from inference of permeability fields in elliptic PDEs to benchmark high-dimensional spatial statistics problems such as inference in log-Gaussian cox point processes.

In addition to its computational efficiency and parallelizability, advantages of the map approach include: providing clear convergence criteria and error measures, providing analytical expressions for posterior moments, evaluating at no additional computational cost the marginal likelihood/evidence (thus enabling model selection), the ability to generate independent uniformly-weighted posterior samples without additional model evaluations, and the ability to efficiently propagate posterior information to subsequent computational modules (thus enabling stochastic control).

In the context of data assimilation, Dr. Moselhy will present an optimal map algorithm for filtering of nonlinear chaotic dynamical systems. Such an algorithm is suited for a wide variety of applications including prediction of weather and climate. The main advantage of the algorithm is that it inherently avoids issues of sample impoverishment common to particle filters, since it explicitly represents the posterior as the push forward of a reference measure rather than with a set of samples.

December 13, 2012 - Russell Carden: Automating and Stabilizing the Discrete Empirical Interpolation Method for Nonlinear Model Reduction

The Discrete Empirical Interpolation Method (DEIM) is a technique for model reduction ofnonlinear dynamical systems. It is based upon a modification to proper orthogonal decomposition, which is designed to reduce the computational complexity for evaluating the reduced order nonlinear term. The DEIM approach is based upon an interpolatory projection and only requires evaluation of a few selected components of the original nonlinear term. Thus, implementation of the reduced order nonlinear term requires a new code to be derived from the original code for evaluating the nonlinearity. Dr. Carden will describe a methodology for automatically deriving a code for the reduced order nonlinearity directly from the original nonlinear code. Although DEIM has been effective on some very difficult problems, it can under certain conditions introduce instabilities in the reduced model. Dr. Carden will present a problem that has proved helpful in developing a method for stabilizing DEIM reduced models.

December 12, 2012 - Charlotte Kotas: Bringing Real-Time Array Signal Processing to the NVIDIA Tesla

Underwater acoustic detection of hostile targets at range requires increasingly computationally advanced algorithms as adversaries become quieter. This seminar will discuss the mathematics behind one such algorithm and some of the challenges associated with modifying it to work in a real-time networked environment. The algorithm was modified from a sequential MATLAB formulation to a parallel CUDA FORTRAN formation designed to run on an NVIDIA Tesla C2050 processor. Speedups of greater than 50◊ were observed over comparable computational sections.

December 6, 2012 - Shuaiwen "Leon" Song: Power, Performance and Energy Models and Systems for Emergent Architectures

Massive parallelism combined with complex memory hierarchies and heterogeneity in high-performance computing (HPC) systems form a barrier to efficient application and architecture design. The performance achievements of the past must continue over the next decade to address the needs of scientific simulations. However, building an exascale system by 2022 that uses less than 20 megawatts will require significant innovations in power and performance efficiency. Prior to this work, the fundamental relationships between power and performance were not well understood. Our analytical modeling approach allows users to quantify the relationship between power and performance at scale by enabling study of the effects of machine and application dependent characteristics on system energy efficiency. Our model helps users isolate root causes of energy or performance inefficiencies and develop strategies for scaling systems to maintain or improve efficiency. I will also show how this methodology can be extended and applied to model power and performance in heterogeneous GPU-based architectures.
Shuaiwen "Leon" Song is a PhD candidate in the Computer Science department of Virginia Tech. His primary research interests fall broadly within the area of High Performance Computing (HPC) with a focus on power and performance analysis and modeling for large scale homogeneous and heterogeneous parallel architectures and runtime systems. He is a recipient of the 2011 Paul E. Torgersen Award for Graduate Student Research Excellence and in 2011 was an Institute for Scientific Computing Research (ISCR) Scholar at Lawrence Livermore National Laboratory. His work has been published in conferences and journals including IPDPS, IEEE Cluster, PACT, MASCOTS, IEEE TPDS, and IJHPCA.

December 6, 2012 - Miroslav Stoyanov: Gradient Based Dimension Reduction Approach for Stochastic Partial Differential Equations

Dimension reduction approach is considered for uncertainty quantification, where we use gradient information to partition the uncertainty domain into “active” and “passive” subspaces, where the “passive” subspace is characterized by near zero variance of the quantity of interest. We present a way to project the model onto the low dimensional “active” subspace and solve the resulting problem using conventional techniques. We derive rigorous error bounds for the projection algorithm and show convergence in $L^1$ norm.

December 5, 2012 - Barbara Chapman: Enabling Exascale Programming:  The Intranode Challenge

As we continue to debate the best way to program emerging generations of leadership-class hardware, it is imperative that we do not ignore the more traditional paths.
Dr. Chapman's presentation considers some of the ways in which today's intranode programming models may help us migrate legacy application code.

December 5, 2012 - Andrew Christlieb: An Implicit Maxwell Solver Based on Method of Lines Transpose

Fast summation methods have been successfully used in a range of plasma applications. However, in the case of moving point charges, direct application of fast summation methods in the time domain requires the use of retarded potentials. In practices, this means that every time a point charge moves in a simulation, it leaves behind an image charge that becomes a source term for all time. Hence, at each time step the number of points in the simulation grows with the number of particles being simulated.

In this talk, Dr. Christlieb will present a new approach to Maxwell's equations based on the method of lines transpose. The method starts by expressing Maxwell’s equations in second order form, and then the time operator is discretized. The resulting implicit system is then solved using integral methods. This process is known as the method of lines transpose. This approach pushes the time history into a volume integral, which does not grow in complexity with time. To efficiently solve the boundary integral, Dr. Christlieb will explain the developed ADI method that is combined with a $O(N)$ solver for the 1D boundary integrals that is competitive with explicit time stepping methods. Because the new method is implicit, this approach does not have a CFL. Further, because the approach is based on an integral formulation, the new method easily encompasses complex geometry with no special modification. Dr. Christlieb will present preliminary results of this method applied to wave propagation and some basic Maxwell examples.

November 27, 2012 - Charles Jackson: Metrics for Climate Model Validation

A “valid” model is a model that has been tested for its intended purpose. In the Bayesian formulation, the “log-likelihood” is a test statistic for selecting, weeding, or weighting climate model ensembles with observational data. Thisstatistic has the potential to synthesize the physical and data constraints on quantities of interest. One of the thorny issues in formulating the log-likelihood is how one should account for biases because not all biases affect predictions of quantities of interest.  Dr. Jackson makes use of a 165-member ensemble CAM3.1/slab ocean climate models with different parameter settings to think through the issues that are involved with predicting eachmodel’s sensitivity to greenhouse gas forcing given what can be observed from the base state. In particular, Dr. Jackson uses multivariate empirical orthogonal functions to decompose the differences that exist among this ensemble to discover what fields and regions matter to the model’s sensitivity. What is found is that the differences that matter can be a small fraction of the total discrepancy. Moreover, weighting members of the ensemble using this knowledge does a relatively poor job of adjusting the ensemble mean toward the known answer. Dr. Jackson will discuss the implications of this result.

November 15, 2012 - Erich Foster: Finite Elements for the Quasi-Geostrophic Equations of the Ocean

Erich Foster will present a conforming finite element (FE) discretization of the stream function formulation of the pure stream function form of the quasi-geostrophic equations (QGE), which are a commonly used model for the large scale wind-driven ocean circulation. The pure stream function form of the QGE is a fourth-order PDE and therefore requires a C^1 FE discretization to be conforming. Thus, the Argyris finite element, a C^1 FE with 21 degrees of freedom, was chosen for the FE discretization of the QGE. Optimal error estimates for the pure stream function form of the QGE will be presented. The QGE is a simplified model of the ocean, however it can be computationally expensive to resolve all scales, therefore numerical methods, such as the two-level method, are indispensable for time sensitive projects. A two-level method and optimal error estimate for a two-level method applied to the conforming FE discretization of the pure stream function form of the QGE will be presented and computational efficiency will be demonstrated.

October 25, 2012 - Shi Jin: Asymptotic-Preserving Schemes for Boltzmann Equation and Relative Problems with Stiff Sources

Dr. Shi Jin will propose a general framework to design asymptotic preserving schemes for the Boltzmann kinetic and related equations. Numerically solving these equations are challenging due to the nonlinear stiff collision (source) terms induced by small mean free or relaxation time. Dr. Jin will propose to penalize the nonlinear collision term by a BGK-type relaxation term, which can be solved explicitly even if discretized implicitly in time. Moreover, the BGK-type relaxation operator helps to drive the density distribution toward the local Maxwellian, thus naturally imposes an asymptotic-preserving scheme in the Euler limit. The scheme so designed does not need any nonlinear iterative solver or the use of Wild Sum. It is uniformly stable in terms of the (possibly small) Knudsen number, and can capture the macroscopic fluid dynamic (Euler) limit even if the small scale determined by the Knudsen number is not numerically resolved. Dr. Jin will show how this idea can be applied to other collision operators; such as the Landau-Fokker-Planck operator, Ullenbeck-Urling model, and in the kinetic-fluid model of disperse multiphase flows.

October 24, 2012 - Shi Jin: Semiclassical Computation of High Frequency Waves in Heterogeneous Media

Dr. Shi Jin will introduce semiclassical Eulerian methods that are efficient in computing high frequency waves through heterogeneous media. The method is based on the classical Liouville equation in phase space, with discontinuous Hamiltonians due to the barriers or material interfaces. Dr. Jin will provide physically relevant interface conditions consistent with the correct transmissions and reflections, and then build the interface conditions into the numerical fluxes. This method allows the resolution of high frequency waves without numerically resolving the small wave lengths, and capture the correct transmissions and reflections at the interface. This method can also be extended to deal with diffraction and quantum barriers. Dr. Jin will also discuss Eulerian Gaussian beam formulation which can compute caustics more accurately.

October 09, 2012 - Christian Ringhofer: Charged Particle Transport in Narrow Geometries under Strong Confinement

Kinetic transport in narrow tubes and thin plates, involving scattering of particles with a background, is modeled by classical and quantum mechanical sub-band type macroscopic equations for the density of particles (ions).  The result are diffusion equation with the projection of the (asymptotically conserved) energy tensor on the confined directions as an additional free variable, on large time scales.  Classical transport of ions through protein channels and quantum transport in thin films are discussed as examples of the application  of this methodology.

October 05, 2012 - Amilcare Porporato: Stochastic soil moisture dynamics: from soil-plant biogeochemistry and land-atmosphere interactions to sustainable use of soil and water

The soil-plant-atmosphere system is characterized by a large number of interacting processes with high degree of unpredictability and nonlinearity. These elements of complexity, while making a full modeling effort extremely daunting, are also responsible for the emergence of characteristic behaviors. Duke University model these processes by mean of minimalist models which describe the main deterministic components of the system and surrogate the high dimensional ones (i.e., hydroclimatic variability and rainfall in particular) with suitable stochastic terms. The solution of the stochastic soil water balance allows us to describe probabilistically several ecohydrological processes, including ecosystem response plant productivity as well as soil organic matter and nutrient cycling dynamics. Dr. Porporato will also discuss how such an approach can be extended to include land atmosphere feedbacks and related impact on convective precipitation. Dr. Porporato will conclude with a brief discussion of how these methods can be employed to address quantitatively the sustainable management of water and soil resources, including optimal irrigation and fertilization, phytoremediation, and soil salinization risk.