Events
Workshops and Conferences
Advances in Scientific Computing and Applied Mathematics (October 912, 2015)
Clayton Webster
The conference on "Advances in scientific computing and applied mathematics" will take place October 912 in the Stratosphere Hotel in Las Vegas, Nevada. The conference is cosponsored by the Oak Ridge National Laboratory, Sandia National Laboratories and The Office of Science, Advanced Simulation Research Computing (ASCR), at the Department of Energy.
We also invite the participants to submit an original research paper to a special issue of the journal "Computers and Mathematics with Applications" to honor Prof. Max Gunzburger's 70th birthday. The issue will be edited by Drs. Pavel Bochev, Qiang Du, Steven L. Hou and Clayton Webster. The submissions are due by March 31st, 2015. We look forward to your contribution.
Visit site [here].
OpenSHMEM 2015: Second workshop on OpenSHMEM and Related Technologies (August 46, 2015)
Pavel Shamis
The OpenSHMEM workshop is an annual event dedicated to the promotion and advancement of the OpenSHMEM programming interface and to helping shape its future direction. It is the premier venue to discuss and present the latest developments, implementation technologies, tools, trends, recent research ideas and results related to OpenSHMEM. This year's workshop will explore the ongoing evolution of OpenSHMEM as a next generation PGAS programming model to address the needs of exascale applications. The focus will be on future extensions to improve OpenSHMEM on current and upcoming architectures. Although, this is an OpenSHMEM specific workshop, we welcome ideas used for other PGAS languages/APIs that may be applicable to OpenSHMEM.
Visit site [here].
Beyond Lithium Ion VIII (June 24, 2015)
Sreekanth Pannala and Jason Zhang
Significant advances in electrical energy storage could revolutionize the energy landscape. For example, widespread adoption of electric vehicles could greatly reduce dependence on finite petroleum resources, reduce carbon dioxide emissions and provide new scenarios for grid operation. Although electric vehicles with advanced lithium ion batteries have been introduced, further breakthroughs in scalable energy storage, beyond current stateoftheart lithium ion batteries, are necessary before the full benefits of vehicle electrification can be realized.
Motivated by these societal needs and by the tremendous potential for materials science and engineering to provide necessary advances, a consortium comprising IBM Research and five U.S. Department of Energy National Laboratories (National Renewable, Argonne, Lawrence Berkeley, Pacific Northwest, and Oak Ridge) will host a symposium June 24, 2015, at Oak Ridge National Laboratory. This is the eighth in a series of conferences that began in 2009.
Visit site [here].
Numerical and Computational Developments to Advance Multiscale Earth System Models (MSESM) (June 13, 2015)
Kate Evans
Substantial recent development of Earth system models has enabled simulations that capture climate change and variability at ever finer spatial and temporal scales. In this workshop we seek to showcase recent progress on the computational development needed to address the new complexities of climate models and their multiscale behavior to maximize efficiency and accuracy. This workshop brings together computational and domain Earth scientists to focus on Earth system models at the largest scales for deployment on the largest computing and data centers.
Topics include, but are not limited to:
 multiscale time integration
 advection schemes
 regionally and dynamicallyrefined meshes
 manycore acceleration techniques
 examination of multiscale atmospheric events
 coupled interactions between model components (such as atmosphere, ocean, and land), and techniques to make these couplings computationally tractable
Visit the site [here].
ASCR Workshop on Quantum Computing for Science (February 1718, 2015)
Travis Humble
At the request of the Department of Energy's (DOE) Office of Advanced Scientific Computing Research (ASCR), this program committee has been tasked with organizing a workshop to assess the viability of quantum computing technologies to meet the computational requirements in support of DOE's science and energy mission and to identify the potential impact of these technologies. As part of the process, the program committee is soliciting community input in the form of position papers. The program committee will review these position papers and, based on the fit of their area of expertise and interest, selected contributors will have the opportunity to participate in the workshop currently planned for February 1718th, 2015 in Bethesda, MD.
Visit the site [here].
OpenSHMEM User Group  OUG2014 (October 7)
The OpenSHMEM User Group (OUG 2014) is a user meeting dedicated to the promotion and advancement of all aspects of the OpenSHMEM API and its tools ecosystem. The goal of the meeting is to discuss and present the ongoing user experiences, research, implementations, and tools that use the OpenSHMEM API. A particular emphasis will be given to the ongoing research that can enhance the OpenSHMEM specification to leverage the emerging hardware while addressing the application needs.
Visit the site [here].
DOE Vehicle Technologies Office Annual Merit Review and Peer Evaluation Meeting (June 1620)
John Turner and Sreekanth Pannala
The DOE Vehicle Technologies Office Annual Merit Review and Peer Evaluation Meeting was held on June 1620, 2014, in Washington, D.C. John Turner, Group Leader of Computational Engineering and Energy Sciences (CEES) and Sreekanth Pannala, Distinguished Staff Member in CEES, attended. Dr. Pannala presented a summary of the Open Architecture Software (OAS) developed for the Computer Aided Engineering for Batteries (CAEBAT) project and described related activities such as the development of a common input format based on XML for battery simulation tools and a common "battery state" file format to facilitate transfer of information between simulation tools.
ORNL Software Expo (May 7)
Jay Billings and Dasha Gorin
There are programmers, researchers, and engineers all over the ORNL campus developing software or modifying existing programs to better meet either their goals, and/or help others reach theirs. Because we are so spread out, many scientists are unaware of others' projectspossibly missing out on important opportunities to collaborate or otherwise ease their burden. The Computer Science Research Group would like to provide a chance for everyone to come together and remedy this. We will be hosting a poster session in the JICS atrium on Wednesday, May 7th, from 9:00am12:00pm. Presenters will be showcasing posters and/or live demos of their projects. Anyone working at ORNL, from interns to senior staff members, may register to present a (nonclassified) project at www.csm.ornl.gov/expo; the deadline to register is April 16th. Nonpresenting attendees do not need to register. Please join us; this is not only a great networking opportunity, but also a celebration of ORNL's diverse programming community!
Visit the site [here].
Spring Durmstrang Review (March 2526)
The spring review for the Durmstrang project managed by the ESSC was held on March 2526 in Maryland. Durmstrang is a DoD/ORNL collaboration in extreme scale high performance computing. The long term goal of the project is to support the achievement of sustained exascale processing on applications and architectures of interest to both partners. Steve Poole, Chief Scientist of CSMD, presented the overview and general status update at the spring review. Benchmarks R&D discussion was facilitated by Josh Lothian, Matthew Baker (left in photo), Jonathan Schrock, and Sarah Powers of ORNL; Languages and Compilers R&D discussion was facilitated by Matthew Baker, Oscar Hernandez, Pavel Shamis (right in photo), and Manju Venkata of ORNL; I/O and FileSystems R&D discussion was facilitated by Brad Settlemyer of ORNL; Networking R&D discussion was facilitated by Nagi Rao, Susan Hicks, and Paul Newman of ORNL; Power Aware Computing R&D discussion was facilitated by ChungHsing Hsu of ORNL; System Schedulers R&D discussion was facilitated by Greg Koenig, Tiffany Mintz, and Sarah Powers of ORNL. A special panel on Networking R&D was also convened to discuss best practices and path forward. Panelists included both DoD and ORNL members. The topics of discussion during the executive session of the review included continued funding/growth of the program, task progression, and development of performance metrics for the project.
SOS 18 (March 1720)
On March 1720, 2014, Jeff Nichols, Al Geist, Buddy Bland, Barney Maccabe, Jack Wells, and John Turner attended the 18th SandiaOak RidgeSwitzerland workshop (SOS18) [1]. The SOS workshops are coorganized by James Ang at Sandia National Laboratory, John Turner at ORNL, and Thomas Schulthess at the Swiss National Computing Center. The theme this year was "Supercomputers as scientific instruments", and a number of presentations examined this analogy extensively. John Turner, Computational Engineering and Energy Sciences Group Leader conveys the following impressions: (1) Python is ubiquitous, (2) Domain Specific Languages (DSLs) are no longer considered exotic, (3) proxy apps / miniapps continue to gain popularity as a mechanism for domain developers to interact with computer scientists, and (4) there is increased willingness on the part of code teams to consider a full rewrite of some codes, but funding for such activities remains unclear. Presentations can be obtained from the SOS18 web site [2].
[1] http://www.cscs.ch/sos18/index.html
[2] http://www.cscs.ch/sos18/agenda/index.html
2014 Gordon Research Conference on Batteries (March 914)
John Turner
On March 914, Computational Engineering and Energy Sciences Group Leader John Turner attended the 2014 Gordon Research Conference on Batteries [1] in Ventura, CA. Dr. Turner presented a poster titled "3D Predictive Simulation of Battery Systems" on behalf of the team working on battery simulation, Sreekanth Pannala, Srikanth Allu, Srdjan Simunovic, Sergiy Kalnaus, Wael Elwasif, and Jay Jay Billings. The work was funded through the Vehicle Technologies (VT) program office within the EERE [2] as part of the CAEBAT program [3]. This program, led by NREL and including industry and university partners, is developing computational tools for the design and analysis of batteries. CSDM staff are leading development of the shared computational infrastructure used across the program.
[1] http://www.grc.org/programs.aspx?year=2014&program=batteries
[2] DOE Office of Energy Efficiency and Renewable Energy (EERE)
[3] The ComputerAided Engineering for Batteries (CAEBAT) program (http://www.nrel.gov/vehiclesandfuels/energystorage/
caebat.html)
Advisory Council Review (March 67)
The third annual meeting of the Oak Ridge National Laboratory Computing and Computational Sciences Directorate (CCSD) advisory committee was convened March 67 to focus on two key areas of Directorate activities. The primary activities were to review CCSD's recent developments in computational and applied mathematics and CCSD's geospatial data science program and its impact on problems of national and global significance.
In the course of the review, CSMD researchers presented five posters covering their work in computational and applied mathematics.
Developing U.S. Phenoregions from Remote Sensing
Jitendra Kumar, Forrest Hoffman, and William Hargrove
Variations in vegetation phenology can be a strong indicator of ecological change or disturbance. Phenology is also strongly influenced by seasonal, interannual, and longterm trends in climate, making identification of changes in forest ecosystems a challenge. Normalized difference vegetation index (NDVI), a remotely sensed measure of greenness, provides a proxy for phenology. NDVI for the conterminous United States (CONUS) derived from the Moderate Resolution Spectroradiometer (MODIS) at 250 m resolution was used in this study to develop phenological signatures of ecological regimes called phenoregions. By applying a unsupervised, quantitative data mining technique to NDVI measurements for every eight days over the entire MODIS record, annual maps of phenoregions were developed. This technique produces a prescribed number of prototypical phenological states to which every location belongs in any year. Since the data mining technique is unsupervised, individual phenoregions are not identified with an ecologically understandable label. Therefore, we applied the method of MAPCURVES to associate individual phenoregions with maps of biomes, land cover, and expertderived ecoregions. By applying spatial overlays with various maps, this "labelstealing" method exploits the knowledge contained in other maps to identify properties of our statistically derived phenoregions.
3D Virtual Vehicle
Sreekanth Pannala and John Turner
Advances in transportation technologies, sensors, and onboard computers continue to push the efficiency of vehicles, resulting in an exponential increase in design parameter space. This expansion spans the entire vehicle and includes individual components such as combustion engines, electric motors, power electronics, energy storage, and waste heat recovery, as well as weight and aerodynamics. The parameter space has become so large that manufacturers do not have the computational resources or software tools to optimize vehicle design and calibration. This expanded flexibility in vehicle design and control, in addition to stringent CAFE standards, is driving a need for the development of new highfidelity vehicle simulations, optimization methods, and selflearning control methods. The biggest opportunity for improvements in vehicle fuel economy is improved integration and optimization of vehicle subsystems. The current industry approach is to optimize individual subsystems using detailed computational tools and to optimize the vehicle system with a combination of low order mapbased simulations with physical prototype vehicles. Industry is very interested in reducing the dependence on prototype vehicles due to significant investment cost and time. With increasingly aggressive fuel economy standards, emissions regulations, and unprecedented growth in vehicle technologies, the current approach is simply not sufficient to meet these challenges. The increase in technologies has led to an exponential growth in parameter and calibration space. Advanced modeling and simulation through virtual vehicle framework can facilitate accelerated development of vehicles through the rapid exploration and optimization of parameter space, while providing guidance to more focused experimental studies. Each of the component areas require HPC resources, and an integrated system approach will likely approach Exascale.
Networking and Communications Research and Development
Pavel Shamis, Brad Settlemyer, Nagi Rao, Thomas Naughton, and Manju Gorentla
Universal Common Communication Substrate (UCCS) is a communication middleware that aims to provide a high performing lowlevel communication substrate for implementing parallel programming models. UCCS aims to deliver a broad range of communication semantics such as active messages, collective operations, puts, gets, and atomic operations. This enables implementation of onesided and twosided communication semantics to efficiently support both PGAS (OpenSHMEM, UPC, CoArray Fortran, ect) and MPIstyle programming models. The interface is designed to minimize software overheads, and provide direct access to network hardware capabilities without sacrificing productivity. This was accomplished by forming and adhering to the following goals: Provide a universal network abstraction with an API that addresses the needs of parallel programming languages and libraries. Provide a highperformance communication middleware by minimizing software overheads and taking full advantage of modern network technologies with communicationoffloading capabilities. Enable network infrastructure for upcoming parallel programming models and network technologies.
Compute and Data Environment for Science (CADES)
Galen Shipman
The Compute and Data Environment for Science (CADES) provides R&D with a flexible and elastic compute and data infrastructure. The initial deployment consists of over 5 petabytes of highperformance storage, nearly half a petabyte of scalable NFS storage, and over 1000 compute cores integrated into a high performance ethernet and InfiniBand network. This infrastructure, based on OpenStack, provides a customizable compute and data environment for a variety of use cases including largescale omics databases, data integration and analysis tools, data portals, and modeling/simulation frameworks. These services can be composed to provide endtoend solutions for specific science domains.
Codesigning Exascale
Scott Klasky and Jeffrey Vetter
Codesign refers to a computer system design process where scientific problem requirements influence architecture design and technology and constraints inform formulation and design of algorithms and software. To ensure that future architectures are wellsuited for DOE target applications and that major DOE scientific problems can take advantage of the emerging computer architectures, major ongoing research and development centers of computational science need to be formally engaged in the hardware, software, numerical methods, algorithms, and applications codesign process. Codesign methodology requires the combined expertise of vendors, hardware architects, system software developers, domain scientists, computer scientists, and applied mathematicians working together to make informed decisions about features and tradeoffs in the design of the hardware, software and underlying algorithms. CSMD is a CoPI organization on all three ASCR Codesign Centers: Exascale Codesign Center for Materials in Extreme Environments (ExMatEx), Center for Exascale Simulation of Advanced Reactors (CESAR), and Center for Exascale Simulation of Combustion in Turbulence (ExaCT). Read more about the ASCR Codesign centers [here].
OpenSHMEM Workshop (March 46)
Oscar Hernandez, Pavel Shamis, and Jennifer Goodpasture
The OpenSHMEM workshop for Extreme Scale Systems Center (ESSC) was held in Annapolis, Maryland on March 46. This workshop was held to promote the advancement of parallel programming with the OpenSHMEM programming interface and to help shape its future direction. The OpenSHMEM workshop is an annual event dedicated to the promotion and advancement of parallel programming with the OpenSHMEM programming interface and to helping shape its future direction. It is the premier venue to discuss and present the latest developments, implementation technology, tools, trends, recent research ideas and results related to OpenSHMEM and its use in applications. This year's workshop also emphasized the future direction of OpenSHMEM and related technologies, tools and frameworks. It also focused on future extensions for OpenSHMEM and hybrid programming on platforms with accelerators. Topics of interest for conference included (but were not limited to): Experiences in OpenSHMEM applications in any domain Extensions to and shortcomings of current OpenSHMEM specification Hybrid heterogeneous or manycore programming with OpenSHMEM and other languages or APIs (i.e. OpenCL, OpenACC, CUDA, OpenMP) Experiences in implementing OpenSHMEM on new architectures Low level communication layers to support OpenSHMEM or other PGAS languages/APIs Performance evaluation of OpenSHMEM or OpenSHMEMbased applications Power / energy studies of OpenSHMEM Static analysis and verification tools for OpenSHMEM Modeling and performance analysis tools for OpenSHMEM and/or other PGAS languages/APIs. Autotuning or optimization strategies for OpenSHMEM programs Runtime environments and schedulers for OpenSHMEM Benchmarks and validation suites for OpenSHMEM The workshop had participation from DoD, DoE Office of Science, and other National Labs such as Argonne National Lab and Sandia National Lab. Visit the site [here].
2014 Oil & Gas HighPerformance Computing Workshop (March 6)
David Bernholdt, Scott Klasky, David Pugmire, Suzy Tichenor
On 6 March, Rice University in Houston hosted the seventh annual meeting on the computing an information technology challenges and needs in the oil and gas industry. CSMD researchers figured prominently on the day's program, which included 31 presentations an additional 31 research posters, and attracted over 500 registered participants. Computer Science Research Group Leader David Bernholdt gave a plenary talk titled "Some Assembly Required? Thoughts on Programming at Extreme Scale" which covered a broad range of issues in programming future systems, including programming models and languages, resilience, and the engineering of scientific software. Scientific Data Group Leader Scott Klasky gave a talk titled "Extreme Scale I/O using the Adaptable I/O System (ADIOS)" in the "Programming Models, Libraries, and Tools" track, which introduced ADIOS to the audience of Oil&Gas industry by describing the problems of data driven science and the approach taken by ADIOS to extreme scale data processing. Finally, David Pugmire, a member of the Scientific Data Group gave a talk titled" Visualization of Very Large Scientific Data" in the "Systems Infrastructure, Facilities, and Vizualization" track which addressed the challenges of current and future HPC systems for large scale visualization and analysis. Suzy Tichenor, Director of Industrial Partnerships for the Computing and Computational Sciences Directorate also participated in the workshop.
Prevent ThreeEyed Fish: Analyze Your Nuclear Reactor with Eclipse (March 1)
Jordan Deyton and Jay Jay Billings
Jordan Deyton copresented with Jay Jay Billings at EclipseCon North America 2014 on Wednesday, March 19. The talk demonstrated the NEAMS Integrated Computational Environment (NiCE) and its postsimulation reactor analysis plugin called the Reactor Analyzer, which supports Light Water and Sodiumcooled Fast Reactor analysis.
Software Productivity for ExtremeScale Science Workshop (January 1314)
The ASCR Workshop on Software Productivity for ExtremeScale Science (SWP4XS) was held 1314 January 2014 in Rockville, MD. The meeting was organized by researchers from ANL, LBNL, LANL, LLNL, ORNL (David Bernholdt, CSR/CSMD), and the Universities of Alabama and Southern California at the behest of the US Department of Energy Office of Advanced Scientific Computing Research, to bring together computational scientists from academia, industry, and national laboratories to identify the major challenges of largescale application software productivity on extremescale computing platforms.
The focus of the workshop was on assessing the needs of computational science software in the age of extremescale multicore and hybrid architectures, examining the scientific software lifecycle and infrastructure requirements for largescale code development efforts, and exploring potential contributions and lessons learned that software engineering can bring to HPC software at scale. Participants were asked to identify short and longterm challenge of scientific software that must be addressed in order to significantly improve the productivity of emerging HPC computing systems through effective scientific software development processes and methodologies.
The workshop included more than 70 participants, including ORNL researchers Ross Bartlett (CEES/CSMD), Al Geist (CTO/CSMD), Judy Hill (SciComp/NCCS), Jeff Vetter (FT/CSMD), in addition to organizer Bernholdt. Participants contributed 35 position papers in advance of the workshop, and the workshop itself included 19 presentations, a panel discussion, and three sets of breakout sessions, most of which are archived on the workshop's web site (http://www.orau.gov/swproductivity2014/)
An outcome of the workshop will be a report that articulates and prioritizes productivity challenges and recommends both short and longterm research directions for software productivity for extremescale science.
Society for Industrial and Applied Mathematics Annual Meeting
The SIAM Annual Meeting is the largest applied math conference held every year. Guannan Zhang and Miroslav Stoyanov organized a minisymposium on "Recent Advances in Numerical Methods for Partial Differential Equations with Random Inputs", which is a rapidly growing field that is of great importance to science. There were 12 invited speakers both national labs and academia including ORNL, Argonne National Lab, Florida State University, University of California, University of Pittsburgh, Virginia Tech, University of Minnesota, Auburn University. The minisymposium was well attended by an even wider variety of researchers. The minisymposium gave participants an opportunity to discuss their current work as well as future development of the field.
Durmstrang2 Review
The Fall review for the Durmstrang2 project was held on September 1011 in Maryland. Durmstrang2 is a DoD/ORNL collaboration in extreme scale high performance computing. The long term goal of the project is to support the achievement of sustained exascale processing on applications and architectures of interest to both partners. The Durmstrand2 project is managed from the Extreme Scale Systems Center (ESSC) of CCSD.
Steve Poole, Chief Scientist of CSMD, presented the overview and general status update at the Fall review. Benchmarks R&D discussion was facilitated by Josh Lothian, Matthew Baker, Jonathan Schrock, and Sarah Powers of ORNL; Languages and Compilers R&D discussion was facilitated by Matthew Baker, Oscar Hernandez, Pavel Shamis, and Manju Venkata of ORNL; I/O and FileSystems R&D discussion was facilitated by Brad Settlemyer of ORNL; Networking R&D discussion was facilitated by Nagi Rao, Susan Hicks, Paul Newman, Neena Imam, and Yehuda Braiman of ORNL; Power Aware Computing R&D discussion was facilitated by ChungHsing Hsu of ORNL; System Schedulers R&D discussion was facilitated by Greg Koenig and Sarah Powers of ORNL. A special panel on Lustre was also convened to discuss best practices and path forward. Panelists included both DoD and ORNL members. The topics of discussion during the executive session of the review included continued funding/growth of the program, task progression, and development of performance metrics for the project.
Upcoming events for ESSC include: OpenSHMEM BirdsofaFeather session at Supercomputing 2013, OpenSHMEM booth at Supercomputing 2013, and OpenSHMEM Workshop (date to be announced).
ORNL R&D Staff to recently join the ESSC team is Dr. Tiffany Mintz (Computer Science Research Group).
CAEBAT Annual Review
The ComputerAided Engineering for Batteries (CAEBAT) program [1] is funded through the Vehicle Technologies (VT) program office within the DOE Office of Energy Efficiency and Renewable Energy (EERE). This program, led by NREL and including industry and university partners, is developing computational tools for the design and analysis of batteries with improved performance and lower cost. CSDM staff in the Computational Engineering and Energy Science (CEES) and Computer Science (CS) groups are leading development of the shared computational infrastructure used across the program, known as the Open Architecture Software (OAS), as well as defining standards for input and battery "state" representations [2].
On Aug. 27, 2013, the ORNL team (Sreekanth Pannala, Srdjan Simunovic, Wael Elwasif, Sergiy Kalnaus, Jay Jay Billings, Taylor Patterson, and CEES Group Leader John Turner) hosted the CAEBAT Program Manager, Brian Cunningham, at ORNL. This visit served as an annual review for the ORNL CAEBAT effort, and provided a venue for the team to demonstrate progress in simulation capabilities, including an initial demonstration of the use of the NEAMS Integrated Computational Environment (NiCE) with OAS [3].
[1] http://www.nrel.gov/vehiclesandfuels/energystorage/caebat.html
[2] http://energy.ornl.gov/CAEBAT/
[3] http://sourceforge.net/apps/mediawiki/niceproject/index.php?title=CAEBAT
First OpenSHMEM Workshop: Experiences, Implementations and Tools
October 2325
The OpenSHMEM workshop is an annual event dedicated to the promotion and advancement of parallel programming with the OpenSHMEM programming interface and to helping shape its future direction. It is the premier venue to discuss and present the latest developments, implementation technology, tools, trends, recent research ideas and results related to OpenSHMEM and its use in applications. This year's workshop will also emphasize the future direction of OpenSHMEM and related technologies, tools and frameworks. We will also focus on future extensions for OpenSHMEM and hybrid programming on platforms with accelerators. Although, this is an OpenSHMEM specific workshop, we welcome ideas used for other PGAS languages/APIs that may be applicable to OpenSHMEM.
Topics of interest for conference include (but are not limited to):
 Experiences in OpenSHMEM applications in any domain
 Extensions to and shortcomings of current OpenSHMEM specification
 Hybrid heterogeneous or manycore programming with OpenSHMEM and other languages or APIs (i.e. OpenCL, OpenACC, CUDA, OpenMP)
 Experiences in implementing OpenSHMEM on new architectures
 Low level communication layers to support OpenSHMEM or other PGAS languages/APIs
 Performance evaluation of OpenSHMEM or OpenSHMEMbased applications
 Power/energy studies of OpenSHMEM
 Static analysis and verification tools for OpenSHMEM
 Modeling and performance analysis tools for OpenSHMEM and/or other PGAS languages/APIs
 Autotuning or optimization strategies for OpenSHMEM programs
 Runtime environments and schedulers for OpenSHMEM
 Benchmarks and validation suites for OpenSHMEM
http://www.csm.ornl.gov/workshops/openshmem2013/
Fourth Workshop on Data Mining in Earth System Science
June 57
CSMD researcher Forrest Hoffman organized the Fourth Workshop on Data Mining in Earth System Science (DMESS 2013; http://www.climatemodeling.org/workshops/dmess2013/) with coconveners Jitendra Kumar (ORNL), J. Walter Larson (Australian National University, AUSTRALIA), and Miguel D. Mahecha (Max Planck Institute for Biogeochemistry, GERMANY). This workshop was held in conjunction with the 2013 International Conference on Computational Sciences (ICCS 2013; http://www.iccsmeeting.org/iccs2013/) in Barcelona, Spain, on June 57, 2013, and was chaired by J. Walter Larson. Richard T. Mills and Brian Smith of ORNL both presented papers in the DMESS 2013 session. These papers were published in volume 18 of Procedia Computer Science and are available at http://dx.doi.org/10.1016/j.procs.2013.05.411 and http://dx.doi.org/10.1016/j.procs.2013.05.408
Special Symposium on Phenology
April 1418
CSMD researcher Forrest Hoffman coorganized a Special Symposium on Phenology with Bill Hargrove and Steve Norman (USDA Forest Service) and Joe Spruce (NASA Stennis Space Center) at the 2013 U.S.International Association for Landscape Ecology Annual Symposium (USIALE 2013; http://www.usiale.org/austin2013/), which was held April 1418, 2013, in Austin, Texas. Hoffman also gave an oral presentation in this symposium. Titled "Developing Phenoregion Maps Using Remotely Sensed Imagery", this presentation described application of a data mining algorithm to the entire record of MODIS satellite NDVI for the conterminous U.S. at 250 m resolution to delineate annual maps of phenological regions. In addition, I was coauthor on four other oral presentations at the USIALE Symposium, including one by Jitendra Kumar (ORNL) that described an imputation technique for estimating tree suitability from sparse measurements.
SOS 17 Conference
March 2528
Successful workshop on Big Data and High Performance Computing hosted by ORNL in Jekyll Island Georgia
SOS is an invitationonly 2 1/2 day meeting held each year by Sandia labs, Oak Ridge National Laboratory, and Swiss Technical institute. This year it was hosted by ORNL in Jekyll Island Georgia on March 2528, 2013.
The theme this year was "The intersection of High Performance Computing and Big Data." There were 40 speakers and panelists from around the world representing views from industry, academia, and national laboratories. The first day focused on the gaps between big computing and big data and the challenges of turning science data into knowledge. On the second day the talks and panels focused on where HPC and big data intersect and the state of bigdata analysis software. The morning of the third day focused on the politics of big data including the issues of data ownership.
Findings of the meeting include the fact that large experimental facilities such as CERNs Large Hadron Collider, and the new telescopes coming online already generate prodigious amounts of scientific data. The volume and speed that data is generated requires that the data be analyzed on the fly and only a tiny fraction be kept. The amount kept still amounts to many petabytes. The attendees stressed how important provenance is to the use of the archived data by other researchers around the world. The majority of today's scientific data is only of value to the original researcher, because the data lacks the metadata required for others to use it. The talks and panels clearly showed the intersection of high performance computing and big data. They also showed that the converse is not necessarily true, i.e. big data (as defined by Google and Amazon) does not require high performance computing. These vendors and their customers are able to get their work done on large, distributed networks of independent PCs. The meeting was filled with lively discussion, and provocative questions.
For those wanting to know more, the agenda and talks are posted on the SOS17 website: http://www.csm.ornl.gov/workshops/SOS17/
SIAM SEAS 2013 Annual Meeting
March 2224
On March 2224, Oak Ridge National Laboratory and the University of Tennessee hosted the 37th annual meeting of the SIAM Southeastern Atlantic Section. The meeting included approximately 160 registered participants, of which roughly 60 were students and 20 were from ORNL. There were 4 plenary talks, 24 minisymposium sessions, seven contributed sessions, and a poster session. Awards were given to students for Best Paper and Best Poster presentations. Attendees were also given guided tours of the Graphite Reactor, the Spallation Neutron Source, and the National Center for Computational Science. The meeting was organized by Chris Baker (ORNL), Cory Hauck (ORNL), Jillian Trask (UT), Lora Wolfe (ORNL), and Yulong Xing (ORNL/UT).
Durmstrang2
March 1819
The semiannual review for the Durmstrang2 project was held on March 1819 in Maryland. Durmstrang2 is a DoD/ORNL collaboration in extreme scale high performance computing. The long term goal of the project is to support the achievement of sustained exascale processing on applications and architectures of interest to both partners. The Durmstrand2 project is managed from the Extreme Scale Systems Center (ESSC) of CCSD.
Steve Poole, Chief Scientist of CSMD, presented the overview and general status update at the March review. Benchmarks R&D discussion was facilitated by Josh Lothian, Matthew Baker, Jonathan Schrock, and Sarah Powers of ORNL; Languages and Compilers R&D discussion was facilitated by Matthew Baker, Oscar Hernandez, Pavel Shamis, and Manju Venkata of ORNL; I/O and FileSystems R&D discussion was facilitated by Brad Settlemyer of ORNL; Networking R&D discussion was facilitated by Nagi Rao, Susan Hicks, Paul Newman, and Steve Poole of ORNL; Power Aware Computing R&D discussion was facilitated by ChungHsing Shu of ORNL; System Schedulers R&D discussion was facilitated by Greg Koenig and Sarah Powers of ORNL. The topics of discussion during the executive session of the review included continued funding/growth of the program and development of performance metrics for the project.
APS 2013 March Meeting
March 1822
The American Physical Society (APS) March Meeting is the largest physics meeting in the world, focusing on research from industry, universities, and major labs. Participation in this years' meeting held in Baltimore, MD (March 1822, 2013) by staff members of the Computational Chemical and Materials Sciences (CCMS) Group included 24 different talks (bold names are from CCMS).
Monojoy Goswami, Bobby G. Sumpter, "Morphology and Dynamics of Ion Containing Polymers using Coarse Grain Molecular Dynamics Simulation", Talk in Session T32: Charged and Ion Containing Polymers (March 21, 2013) APS National Meeting, Baltimore.
Debapriya Banerjee, Kenneth S. Schweizer, Bobby G. Sumpter, Mark D. Dadmun, "Dispersion of small nanoparticles in random copolymer melts", Talk in Session F32: Polymer Nanocomposites II (March 19, 2013) APS National Meeting, Baltimore.
Rajeev Kumar, Bobby G. Sumpter, S. Michael Kilbey II, "00003 Charge regulation and local dielectric function in planar polyelectrolyte brushes", Talk in Session U32: Charged Polymers and Ionic Liquids (March 21, 2013) APS National Meeting, Baltimore.
Alamgir Karim, David Bucknall, Dharmaraj Raghavan, Bobby Sumpter, Scott Sides, "Insitu Neutron Scattering Determination of 3D PhaseMorphology Correlations in Fullerene Polymer Organic Photovoltaic Thin Films", Talk in Session Y33: Organic Electronics and PhotonicsMorphology and Structure I (March 22, 2013) APS National Meeting, Baltimore.
Geoffrey Rojas, P. Ganesh, Simon Kelly, Bobby G. Sumpter, John Schlueter, Petro Maksymovych," Molecule/Surface Interactions and the Control of Electronic Structure In Epitaxial Charge Transfer Salts", Talk in Session U35: Search for New Superconductors III (March 21, 2013) APS National Meeting, Baltimore.
Geoffrey A. Rojas, P. Ganesh, Simon Kelly, Bobby G. Sumpter, John A. Schlueter, Petro Maksymovich, "Density Functional Theory studies of Epitaxial Charge Transfer Salts", Talk in Session N35: Search for New Superconductors III (March 20, 2013) APS National Meeting, Baltimore.
Arthur P. Baddorf, Qing Li, Chengbo Han, J. Bernholc, Humberto Terrones, Bobby G. Sumpter, Miguel FuentesCabrera, Jieyu Yi, Zheng Gai, Peter Maksymovych, Minghu Pan," Electron Injection to Control SelfAssembly and Disassembly of Phenylacetylene on Gold", Talk in Session C33: Organic Electronics and Photonics  Interfaces and Contacts (March 18, 2013) APS National Meeting, Baltimore.
Mina Yoon, Kai Xiao, Kendal W. Clark, AnPing Li, David Geohegan, Bobby G. Sumpter, Sean Smith, "Understanding the growth of nanoscale organic semiconductors: the role of substrates", Talk in Session Z33: Organic Electronics and Photonics  Morphology and Structure II (March 22, 2013) APS National Meeting, Baltimore.
Chengbo Han, Wenchang Lu, Jerry Bernholc, Miguel FuentesCabrera, Humberto Terrones, Bobby G. Sumpter, Jieyu Yi, Zheng Gai, Arthur P. Baddorf, Qing Li,. Peter Maksymovych, Minghu Pan, "Computational Study of Phenylacetylene SelfAssembly on Au(111) Surface", Talk in Session C33: Organic Electronics and Photonics  Interfaces and Contacts (March 18, 2013) APS National Meeting, Baltimore.
Jaron Krogel, Jeongnim Kim, David Ceperley "Prospects for efficient QMC defect calculations: the energy density applied to Ge selfinterstitials", Talk in Session J24: Quantum ManyBody Systems and Methods I (March 19, 2013) APS National Meeting, Baltimore.
Kendal Clark, Xiaoguang Zhang, Ivan Vlassiouk, Guowei He,Gong Gu, Randall Feenstra, AnPing Li, "Mapping the Electron Transport of Graphene Boundaries Using Scanning Tunneling Potentiometry", Talk in Session G6: CVD Graphene  Doping and Defects (March 19, 2013) APS National Meeting, Baltimore.
Gregory Brown, Donald M. Nicholson, Markus Eisenbach, Kh. Odbadrakh "WangLandau or Statistical Mechanics", Talk in Session G6: Equilibrium Statistical Mechanics, Followed by GSNP Student Speaker Award (March 18, 2013) APS National Meeting, Baltimore.
Don Nicholson, Kh. Odbadrakh, German Samolyuk, G. Malcolm Stocks," Calculated magnetic structure of mobile defects in Fe", Session Y16: Magnetic Theory II (March 22, 2013) APS National Meeting, Baltimore.
Khorgolkhuu Odbadrakh, Don Nicholson, Aurelian Rusanu, German Samolyuk, Yang Wang, Roger Stoller, Xiaoguang Zhang, George Stocks, "Coarse graining approach to First principles modeling of structural materials", Session A43: Multiscale modelingCoarsegraining in Space and Time I (March 18, 2013) APS National Meeting, Baltimore.
M. G. Reuter & P. D. Williams, "The Information Content of Conductance Histogram Peaks: Transport Mechanisms, Level Alignments, and Coupling Strengths" Talk in Session R43: Electron Transfer, Charge Transfer and Transport Session, (March 20,2013) APS National Meeting, Baltimore.
Paul R. C. Kent, Panchapakesan Ganesh, Jeongnim Kim, Mina Yoon, Fernando Reboredo, "Binding and Diffusion of Li in Graphite: Quantum Monte Carlo Benchmarks and validation of Van der Waals DFT" Talk in Session A5: Van der Waals Bonding in Advanced Materials – Materials Behavior, (March 18, 2013) APS National Meeting, Baltimore.
Peter Staar, Thomas Maier, Thomas Schulthess, "DCA+: Incorporating selfconsistently a continuous momentum selfenergy in the Dynamical Cluster Approximation" Talk in Session N24, APS National Meeting, Baltimore.
Thomas Maier, Peter Hirschfeld, Douglas Scalapino, Yan Wang, Andreas Kreisel, "Pairing strength and gap functions in multiband superconductors: 3D effects" Talk in Session G37: Electronic Structute Methods II,(March 20, 2013) APS National Meeting, Baltimore.
Thomas Maier, Yan Wang, Andreas Kreisel, Peter Hirschfeld, Douglas Scalapino, "Spin fluctuation theory of pairing in AFe2As2" Talk in Session G37: Electronic Structure Methods II,(March 20, 2013), APS National Meeting, Baltimore.
Peter Hirschfeld, Andreas Kreisel, Yan Wang, Milan Tomic, Harald Jeschke, Anthony Jacko, Roser Valenti, Thomas Maier, Douglas Scalapino, "Pressure dependence of critical temperature of bulk FeSe from spin fluctuation theory" Talk in Session G37: Electronic Structure Methods II (March 20, 2013), APS National Meeting, Baltimore.
Markus Eisenbach, Junqi Yin, Don M. Nicholson, Ying Wai Li, "First principles calculation of finite temperature magnetism in Ni", Talk in Session C17: Magnetic Theory I (March 18, 2013), APS National Meeting, Baltimore.
Madhusudan Ojha, Don M. Nicholson, Takeshi Egami, "Abinitio atomic level stresses in CuZr crystal, liquid and glass phases", Talk in Session G42: Focus Session: Physics of Glasses and Viscous Liquids I (March 19, 2013), APS National Meeting, Baltimore.
Junqi Yin, Markus Eisenbach, Don Nicholson, "Spinlattice coupling in BCC iron", Talk in Session T39: Metals Alloys and Metallic Structures (March 21, 2013), APS National Meeting, Baltimore.
German Samolyuk, Yuri Osetsky, Roger Stoller, Don Nicholson, George Malcolm Stocks, "The modification of core structure and Peierls barrier of 1/2$<111>$ screw dislocation in bcc Fe in presence of Cr solute atoms", Talk in Session T39: Metals Alloys and Metallic Structures (March 21, 2013), APS National Meeting, Baltimore.
SIAMCSE13
February 25  March 1
The CSMD had a strong showing at SIAMCSE13 with over 25 presentations from staff members from the division. This conference is a leading conference in computer science and mathematics, drawing thousands of researchers from across the globe and supported jointly by NSF and DOE. Division scientist organized eight different minisymposiums with close to a hundred invited speakers in areas of modern libraries (Christopher Baker), climate (Kate Evans), nuclear simulations (Bobby Philip), kinetic theory (Cory Hauck), hybrid architecture linear algebra (Ed D'Azevedo), UQ and stochastic inverse problems (Clayton Webster), and Structural Graph Theory, Sparse Linear Algebra, and Graphical Models (Blair Sullivan).
Seminars
November 5, 2015  Christoph Beckermann: Modeling of Microstructure Evolution in Solidification Processes
ABSTRACT: Solidification is fundamental to the manufacture of all metallic materials and components. At the same time, the microstructures that form during solidification represent an interesting example of the spontaneous formation of a complex pattern. Modeling of solidification is challenging because it is characterized by an intricate interplay of multiple phenomena at several length and time scales. This seminar will provide an overview of recent progress made in numerically simulating solidification microstructure evolution. Examples include dendritic growth, columnartoequiaxed grain structure transitions, and concurrent growth and coarsening of mushy zones. Future challenges, particularly with respect to high performance computing and advanced manufacturing processes, are summarized.
November 3, 2015  Ferrol Aderholdt: Virtual Machine Introspectionbased Checkpoint/Restart for Survivable Clouds
ABSTRACT: Cloud computing is an extremely popular computing paradigm with academia and industry. This amount of popularity stems from the various properties of the cloud including ease of use, elasticity, reduced maintenance and energy costs for the consumer, and a payasyougo model. As Enterprise computing migrates from onsite compute resources to cloudbased resources, an increased amount of adoption may occur. Adoption at this scale may present various challenges for cloud providers with respect to the ability to provide faultfree execution for users as well as mitigating attacks by malicious parties. In order to handle these difficulties, survivability may be applied to the cloud architecture such that increased adoption of cloud computing results in increased profits for both the consumer and provider. This talk discusses a virtual machine introspectionbased checkpoint/restart mechanism for use in the cloud survivability framework (CSF), which is a userlevel, componentbased framework that applies the properties of survivability to current infrastructureasaservice (IaaS) cloud architectures.
October 29, 2015  Mike Leuze: DNA2Face: Predicting Faces from a DNA Sample
ABSTRACT: The availability of large datasets linking human genomic sequences to observed traits provides the potential to correlate an individual's specific genomic code with susceptibility to disease, behavior, and physical appearance. The genomephysical appearance relationship is complex, with single genes having an impact on multiple aspects of appearance and individual features being influenced by multiple genes. In this project, we quantify the connections between human genomics and facial appearance using statistical techniques to determine principal components of facial morphology and computational genomics to find associations between these principal components and mutational variation. The ultimate goal is to develop the ability to estimate 3D facial appearance from DNA, a capability of significant value to the law enforcement, national security, and intelligence communities.
October 23, 2015  Alvin R. Lebeck: MolecularScale Nanophotonics for NetworkonChip and Probabilistic Computing Functional Units
ABSTRACT: This talk describes ongoing work exploring the use of emerging molecular scale devices for communication and computation. The first part of the talk presents Molecularscale NetworkonChip (mNoC). We leverage quantum dot LEDs, which provide electrical to optical signal modulation, and chromophores, which provide optical signal filtering for receivers. These devices replace the ring resonators and the external laser source used in contemporary nanophotonic NoCs enabling crossbar scaling up to radix 256. We'll also present mNoC power topologies, enabled by unique capabilities of mNoC technology, to reduce overall interconnect power consumption. A power topology corresponds to the logical connectivity provided by a given power mode. Broadcast is one power mode and it consumes the maximum power. Additional power modes consume less power but allow a source to communicate with a statically defined (potentially noncontiguous physically) subset of nodes. Overall power is reduced if the frequently communicating nodes use low power modes, while less frequently communicating nodes use higher power modes.
The second part of this talk describes our recent work on developing novel computational units to accelerate probabilistic algorithms. Recent advances in statistics and machine learning demonstrate the potential of probabilistic algorithms in achieving high quality solutions; however, there remains a mismatch between current deterministic hardware and these algorithms. To bridge this gap we are exploring devices that exploit Resonance Energy Transfer (RET) between chromophores to create efficient samplers for arbitrary probability distributions. We provide a brief overview of the device behavior, fabrication with DNA Selfassembly, proposed functional units and status of a macroscale prototype.
BIO:
Alvin R. Lebeck is a Professor of Computer Science and of Electrical and Computer Engineering at Duke University. Lebeck's research interests include architectures for emerging nanotechnologies, high performance microarchitectures, hardware and software techniques for improved memory hierarchy performance, multiprocessor systems, and energy efficient computing. In the field of emerging nanotechnologies he has done extensive work exploring the architectural implications of DNA selfassembly as a fabrication method for future systems. In the area of memory systems, Lebeck led efforts in improving cache hierarchy performance, tolerating memory latency, and improving main memory power management.
October 22, 2015  Jian Huang: Interactive Selection of Multivariate Features in Large Spatiotemporal Data
ABSTRACT: Selecting meaningful features is central in the analysis of scientific data. Today's multivariate scientific datasets are often large and complex making it difficult to define general features of interest significant to scientific applications. To address this problem, we propose three general, spatiotemporal metrics to quantify the significant properties of data features  concentration, continuity and cooccurrence, named collectively as CO3. We implemented an interactive visualization system to investigate complex multivariate timevarying data from satellite remote sensing with great spatial resolutions, as well as from realtime continentalscale power grid monitoring with great temporal resolutions. The system integrates CO3 metrics with an elegant multispace user interaction tool to provide various forms of quantitative user feedback. Through these, the system supports an iterative userdriven analysis process. Our findings demonstrate that the CO3 metrics are useful for simplifying the problem space and revealing potential unknown possibilities of scientific discoveries by assisting users to effectively select significant features and groups of features for visualization and analysis. Users can then comprehend the problem better and design future studies using newly discovered scientific hypotheses.
Bio: Jian Huang is a professor in Department of Electrical Engineering and Computer Science at the University of Tennessee, Knoxville. His research expertise includes large data visualization, multivariate data visualization and timevarying data visualization, as well as systems oriented areas of visualization such as parallel, distributed, remote and collaborative visualization. His research has been funded by DOE, NSF, NASA, Department of Interior, Intel, and UTBattelle.
October 21, 2015  Tonglin Li: Distributed NoSQL Storage for ExtremeScale System Services in Clouds and Supercomputers
ABSTRACT: As supercomputers gain more parallelism at exponential rates, the storage infrastructure performance is increasing at a significantly lower rate due to relatively centralized system services and management. This implies that the data management and data flow between the storage and compute resources is becoming the new bottleneck for largescale applications. Similarly, cloud based distributed systems introduce other challenges stemming from the dynamic nature of cloud applications. This talk discusses several challenges on storage systems at extreme scales for supercomputers and clouds and addresses them by designing and implementing a zerohop distributed NoSQL store system (ZHT), which has been tuned for the requirements of highend computing systems. ZHT aims to be a building block for scalable distributed system services. The goals of ZHT are delivering high availability, good fault tolerance, lightweight design, persistence, dynamic joins and leaves, high throughput, and low latencies, at extreme scales (millions of nodes). We have evaluated ZHT's performance under a variety of systems, ranging from a Linux cluster with 64nodes, an Amazon EC2 virtual cluster up to 96nodes, to an IBM Blue Gene/P supercomputer with 8Knodes. This work also presents several real systems that have adopted ZHT as well as other NoSQL systems, namely ZHT/Q, FusionFS, IStore, MATRIX, Slurm++, Fabriq, Graph/Z, FREIDAState, and WaggleDB, all of these real systems have been significantly simplified due to NoSQL storage systems, and have been shown to outperform other leading systems by orders of magnitude in some cases. Through our work, we have shown how NoSQL storage systems can help on both performance and scalability at large scales in such a variety of environments.
Bio: Tonglin Li is a 6th year Ph.D. candidate of the Department of Computer Science at Illinois Institute of Technology, Chicago. He'll receive his PhD degree in December 2015. He is a member of the DataIntensive Distributed Systems Laboratory (DataSys) at IIT, and has been advised by Dr. Ioan Raicu as his research advisor. His research interests include distributed systems, storage systems, cloud computing, high performance computing, and big data. His publications include 3 journal papers, 8 conference paper, and 4 extended abstracts, in leading venues such as IPDPS, TCC, CCPE, and BigData.
October 20, 2015  Todd Gamblin: Build and Test Automation at Livermore Computing
ABSTRACT: "Build and test servers like Jenkins CI and Atlassian Bamboo are commonplace in industry, but they are not widely available for users at large HPC centers. These tools integrate tightly with bug trackers and source control management (SCM) systems, and they allow facilities and code teams to automate development, testing, and deployment workflows.
The need for automated testing is particularly acute on unique, bleedingedge systems like LLNL's Sequoia and ORNL's Titan. However, security issues make it difficult for centers to deploy these tools for all of their users."
"In this talk, I will give an overview of build and test efforts underway in LLNL's Livermore Computing (LC) Division. LC has recently deployed Atlassian Bamboo for endusers on two of our networks. Bamboo allows teams to share a central dashboard on the LC website, and to set permissions on their build configurations through the UI. Our solution allows users to run build agents securely, under their own identity, on production HPC systems. To automate the build process of large HPC applications, LLNL has also developed Spack, a flexible package management tool that allows users to explore the combinatorial build space of HPC packages. LLNL is using Spack with Bamboo to test tools and application codes with the many different compilers, software versions, and configurations that our users demand."
October 15, 2015  Ian Foster: Accelerating Discovery Via Science Services
ABSTRACT: We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In bigscience projects in highenergy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many moreultimately most?researchers will soon require capabilities not so different from those used by such bigscience teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to develop suites of science services to which researchers can dispatch mundane but timeconsuming tasks, and thus to achieve economies of scale and reduce cognitive load. I explore the past, current, and potential future of largescale outsourcing and automation for science, and suggest opportunities and challenges for today's researchers. I use examples from Globus, Swift, and other projects to demonstrate what can be achieved.
Ian Foster is Director of the Computation Institute, a joint institute of the University of Chicago and Argonne National Laboratory. He is also an Argonne Senior Scientist and Distinguished Fellow and the Arthur Holly Compton Distinguished Service Professor of Computer Science. Methods and software developed under his leadership underpin many large national and international cyberinfrastructures. Ian's research interests include distributed, parallel, and dataintensive computing technology, as well as innovative applications of computing technologies to scientific problems.
October 13, 2015  Edmond Chow: Very FineGrained Parallelization of Sparse Linear Algebra Computations
ABSTRACT: Massive concurrency is required in scientific and engineering algorithms in order to run efficiently on future computer architectures. Highend compute nodes already have hundreds to thousands of accelerator cores and core counts are anticipated to further increase. In this talk, we describe some new approaches for certain sparse linear algebra computations, particularly incomplete factorizations and sparse triangular preconditioner solves, that have much more concurrency than existing approaches. The main idea is to transform a problem into one that can be solved iteratively. By using asynchronous iterative methods, the coupling that must exist between processing units is obeyed, but can have much lower overhead than in the synchronous case.
October 9, 2015  Vivek Seshadri: Can DRAM do more than just store data?
ABSTRACT: In today's systems, DRAM is used only as a storage device. Offchip DRAM interfaces allow the memory controller to read and write data. As a result, any operation must first read the required data from DRAM and store the results back into DRAM. In this line of work, we observe that this model is very inefficient for certain key primitives in modern systems. And we ask the question, "Can DRAM do more than just store data?"
In response, we propose three techniques that exploit the DRAM architecture to significantly improve the efficiency of three important operations. First, we propose RowClone, a mechanism to perform bulk copy and initialization (specifically zeroing) operations completely within DRAM. RowClone improves the performance and energy efficiency of these operations by an order of magnitude. Second, we propose GatherScatter DRAM (GSDRAM), a mechanism to improve the efficiency of nonunit strided access patterns. GSDRAM achieves near ideal memory bandwidth and cache utilization for powerof2 strided access patterns. Finally, we propose a new substrate that exploits existing DRAM operation to perform bulk bitwise operations completely within DRAM. Our mechanism enables an orderofmagnitude improvement in the throughput of bitwise operations.
In this talk, I will provide a brief tutorial of DRAM operation. I will then describe these three mechanisms in detail.
BIOGRAPHY: Vivek Seshadri is a Ph.D. student at the Computer Science Department at Carnegie Mellon University. He is advised by Prof. Todd Mowry and Prof. Onur Mutlu. His research interests are primarily in the field of computer systems, with specific focus on designing efficient memory systems.
October 1, 2015  Alexander M. Feldt: Thinking About Vulnerability: Climate Change, Human Rights, and Moral Thresholds
ABSTRACT: Within the scientific community, and occasionally within the media, a lot of attention is placed on specific data points about climate change  450ppm, 3oC warming, 10ft of sealevel rise. Moreover, these are typically presented as significant because they relate to harm we ought to avoid. Essentially, they serve as signifiers of when something bad will happen. However, to identify these various data points or phenomenon as markers of harm, one has already made some corresponding moral judgment about which thresholds are the ones we ought to care about. For example, if I don't think that people have any particular claim to anything beyond mere survival and data shows me people will be able to survive with 10ft of sealevel rise, even if it results in lots of environmental refugees, then I won't and shouldn't care about 10ft as an important data point. In this talk, I examine how engaging climate change from a human rights perspective can provide key resources for understanding what these data points mean as a moral threshold and defending why certain thresholds matter. I will offer an account linking human rights and the environment that utilizes the Capabilities Approach, which is at the core of much of the human development literature, to highlight to broad array of moral harms that can be caused by climate change. This can then be coupled with climate vulnerability modeling in a way that clearly articulates why certain thresholds matter, by linking the impacts of certain scenarios of climate change to human rights violations. By bringing a clear moral framework into climate modeling, we are better able to identify why we do and should care about the many important thresholds offered by the scientific community.
September 30, 2015  Markus Eisenbach: LSMS & WLLSMS: Codes for First Principles Calculation of the Ground State and Statistical Physics of Materials
ABSTRACT: The Locally Selfconsistent Multiple Scattering (LSMS) code solves the first principles Density Functional theory KohnSham equation for a wide range of materials with a special focus on metals, alloys and metallic nanostructures. It has traditionally exhibited near perfect scalability on massively parallel high performance computer architectures. We present our efforts to exploit GPUs to accelerate the LSMS code to enable first principles calculations of O(100,000) atoms and statistical physics sampling of finite temperature properties. Using the Cray XK7 system Titan at the Oak Ridge Leadership Computing Facility we achieve a sustained performance of 14.5PFlop/s and a speedup of 8.6 compared to the CPU only code.
September 17, 2015  Jay Jay Billings: Integrated Modeling and Simulation with Eclipse ICE and its applicability to Neutron Science
ABSTRACT: Simulating the physical world is difficult from any perspective, although computational scientists usually focus on raw compute performance. Many tools exist for doing many different types of simulations and many more tools exist for generating input, postprocessing results or managing data. Users are challenged to figure out how to use their new favorite code and extract knowledge from its results while developers are charged with making "One Simulator to Rule them All" that can be coupled to any and every other code and possibly extended in arbitrary ways. Both scenarios lead to significant challenges that stifle productivity and limit scientific innovation. Those challenges are not insurmountable and can be addressed by developing novel platforms to manage modeling and simulation just like real experiments.
This talk presents ORNL's modeling and simulation platform, the Eclipse Integrated Computational Environment (ICE), that was built to tackle these challenges. Eclipse ICE integrates a large collection of tools for users for input generation, job launch, visualization and data management. It also provides tools for developers in C/C++, Fortran, Java, Python and other languages to develop their software as well as a rich API for extending the platform to provide graphical plugins for their users. This talk will also demonstrate Eclipse ICE's support for several projects related to neutron science including Sassena for neutron scattering and a new simulator for neutron reflectometry. It will present ICE's visualization services that support 2D plotting, 3D geometry editing and fully interactive visualization with VisIt and Paraview. It will show how ICE can be controlled via Python scripts and its integration with other Eclipsebased projects. Finally, thoughts on future directions for the platform and its continued support for neutron science will be presented.
USB sticks with binaries and sample data will be available to attendees who want to follow along in the demonstration.
September 14, 2015  James Elliott: Soft Errors in Linear Solvers: Fighting an Invisible Foe
ABSTRACT: This work presents a novel approach to HPC resilience that couples numerical analysis and analytic modeling. Our work will present models for soft errors in floatingpoint operations, and then extend these models to reveal the expected error should floatingpoint data experience a soft error. We then consider how to develop a resilient linear solver. We present a general approach for enforcing bounded error, and show experimentally that this technique can be very effective. Next, we consider a subset of soft errors that are undetectable given current detection approaches and cause high overhead, i.e., errors that will look correct with respect to a norm. We develop a numerical soft error injection technique that generates such errors, and then we evaluate algorithmic options for coping with such errors in the FTGMRES (nested solvers) selective reliability framework. Our prototype is implemented using the Trilinos library, and all tests are evaluated in parallel with a stateoftheart preconditioner that ensures that failurefree problem solves are very efficient. Our pessimistic error injection coupled with efficient solvers ensures that any overhead introduced by fault tolerance is noticeable. Using this approach, we then reason about algorithmic fault tolerance techniques inside iterative linear solvers using both analytic modeling and experimentation. We show our approach has a low "alwayson" cost, while providing strong coverage for soft errors.
James Elliott is a candidate for a postdoctoral position in the Computer Science Research Group. He graduated from Louisiana Tech University with a B.S. in Computer Science and a M.S. in Mathematics and Statistics. James has been involved with the fields of HPC and resilience since 2005, where he studied how to virtualize a cluster using the Xen hypervisor. In 2007, he integrated various benchmarks into the OSCAR cluster management suite as part of a Google Summer of Code project. He worked directly with the Louisiana Optical Network Initiative as a graduate computational science fellow in 2009, and then moved on to pursue a Ph.D. in Computer Science at North Carolina State. James has studied alternatives to checkpoint/restart, and recently soft error resilience of numerical methods. A strong component of Mr. Elliott's work is the use of analytic modeling, and one day he hopes to demystify the "monster in the closet," that is soft error resilience. Mr. Elliott has also taught at the middle and high school level as part of the NSF GK12 Teaching fellowship, and has worked at three national labs in various studentoriented programs.
September 9, 2015  Dr. Mike Guidry: On the Design, Autotuning, and Optimization of GPU Kernels for Kinetic Network Simulations Using Fast Explicit Integration and GPU Batched Computation
ABSTRACT: This talk reports on an interdisciplinary effort between ORNL and the Innovative Computing Laboratory and the Departments of Physics and Astronomy at UT, to provide new, highlyefficient solvers for realistic simulation of scientific problems. Various scientific applications require solvers that work on many smallsize problems that are independent of each other. At the same time, highend hardware is evolving rapidly and becoming even more throughputoriented, so there is an increasing need for an energyefficient, highperformance approach for these small problems that we call batched computation. The many applications that need this functionality could especially benefit from the use of GPUs, which currently are four to five times more energy efficient than multicore CPUs on important scientific workloads. This talk describes the design, autotuning and optimization of batched GPU methods to accelerate large kinetic network simulations that use novel fast explicit integration algo
rithms.
Taking as a generic test case a Type 1a supernova explosion with an extremely stiff thermonuclear network having 150 isotopic species and 1604 reactions that is assumed coupled to hydrodynamics using operator splitting, we demonstrate the capability to solve 250500 realistic kinetic networks in parallel in the same time that the standard implicit methods used in calculations to date can solve a single such network on a CPU. This ordersofmagnitude decrease in compute time for solving systems of realistic kinetic networks implies that important, coupled, multiphysics problems in various scientific and technical fields that were intractable previously, or could be simulated only with highly schematic kinetic networks, are now computationally feasible.
August 27, 2015  Dr. Bruno Turcksin: Parallelization and Adaptive Mesh Refinement
ABSTRACT: Parallelization and adaptive mesh refinement (AMR) are two techniques that can be exploited to speedup computation and to solve problems that would otherwise be inaccessible due to large memory requirements. In the case of parallelization, the speedup is obtained by partitioning the work between more processors while larger problems can be solved by having access to more memory. Meanwhile, in the case of adaptive mesh refinement, the mesh, and occasionally the polynomial order of the finite elements, is adapted to the problem to reduce the number of unknowns needed to achieve a given accuracy. This results in a smaller system to solve and a diminution of the memory required to solve a given problem. Here, the complementaries and the difficulties of applying these two techniques simultaneously will be illustrated through examples from neutron transport using AMR with MPI and hpFEM for Stokes problem with multithreading.
BIOGRAPHICAL INFORMATION: Dr. Bruno Turcksin earned a Ph.D. in Nuclear Engineering from Texas A&M University in 2012. He is now a visiting assistant professor in the department of Mathematics at Texas A&M, working on the deal.II finite element library. His primary areas of expertise are numerical methods for neutron and electron transport, adaptive mesh refinement, and high performance computing.
August 12, 2015  Alex McCaskey: Code Integration Between the BISON Fuel Performance and PROTEUS Neutronics Applications
ABSTRACT: This talk will present new code coupling strategies for the integration of components from Idaho National Laboratory's Multiphysics ObjectOriented Simulation Environment (MOOSE) and Argonne National Laboratory's SHARP Nuclear Reactor Framework. These frameworks take completely different approaches to the modeling and simulation of advanced nuclear reactor technologies, with MOOSE providing tools for the topdown development of coupled physics codes, and SHARP enabling the integration of existing legacy physics codes from the bottomup. These differing philosophies have so far prevented the efficient integration of existing pieces from the two frameworks. The work presented here will detail a new way to enable this integration by building upon the existing features of both frameworks, as well the introduction of the extensible DataTransferKit for twoway solution transfer. This new methodology for code coupling efficiently enables the integration of codes with different languages, mesh representations, and solve types. This talk will demonstrate this integration avenue for the specific case of code between the BISON (MOOSE) fuel performance and PROTEUS (SHARP) neutronics applications, in an effort to improve solution accuracy for fuel performance calculations.
August 11, 2015  Frank Mueller: On the Implications of LargeScale Manycores and NoCs for Exascale
ABSTRACT: Future compute nodes in HPC will have hundreds if not thousands of codes. To accommodate the data demand of each core, networkonchip (NoC) interconnect architectures are changing from rings to meshes. This work creates a novel communication abstraction for a mesh NoC and assesses the viability of MPI, OpenMP and hybrid execution models on a single die with 64 cores and a 2D mesh. Results indicate the importance of reduction in flow control and absence of contention on the NoC. They further illustrate how to better utilize memory parallelism in a transparent manner for HPC and beyond.
BIOGRAPHY: Frank Mueller is a Professor in Computer Science and a member of multiple research centers at North Carolina State University. Previously, he held positions at Lawrence Livermore National Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from Florida State University in 1994. He has published papers in the areas of parallel and distributed systems, embedded and realtime systems and compilers. He is a member of ACM SIGPLAN, ACM SIGBED and a senior member of the ACM and IEEE Computer Societies as well as an ACM Distinguished Scientist. He is a recipient of an NSF Career Award, an IBM Faculty Award, a Google Research Award and two Fellowships from the Humboldt Foundation.
July 20, 2015  John Feo: Tables, Graphs, and Problems
ABSTRACT: Data collection and analysis is rapidly changing the way scientific, national security, and business communities operate. They have emerged as a fourth paradigm of science with American economic competitiveness and national security depending increasingly on the insightful analysis of large data sets. While extreme scale analytics share many of the computing issues as extreme scale scientific simulations, the nature of the problems and data create important differences. The volume, velocity, variety, and veracity of analytic data set it apart from scientific data. Moreover, the data does not partition neatly along physical boundaries, and algorithms do not map efficiently to bulk synchronous processes with nearest neighbor communication. This is true for both traditional table driven machine learning applications as well as emerging graph methods. While natural partitions can be found, irregular, interpartition connections and extreme load imbalance limit scalability to small number of nodes for runtime systems that assign groups of data to single locals. Without scaling to large number of nodes, inmemory solutions based on such runtime systems are no more attractive than filebase solutions.
While at PNNL, I architected GEMS  a multithreaded, semantic graph engine. The framework had three components: 1) a SPARQL front end to transform SPARQL to data parallel C code; 2) a semantic graph engine with scalable multithreaded algorithms for query processing; and 3) a custom multithreaded runtime layer for scalable performance on conventional cluster systems. Our objectives were twofold: 1) to scale system size as data sizes increase, and 2) to maintain query throughput as system size grows.
In this talk, I will summarize the data challenges facing scientists, intelligence analysts, and business leaders. I will discuss table and graph analytic methods and the problems introduced by the unbalanced distribution of real world data. I will describe GEMS in detail focusing on the graph engine and runtime layer, and present some performance results.
BIO:
Dr. Feo received his Ph.D. in Computer Science from The University of Texas at Austin. He began his career at Lawrence Livermore National Laboratory where he managed the Computer Science Group and was the principal investigator of the Sisal Language Project. Dr. Feo then joined Tera Computer Company (now Cray Inc) where he was a principal engineer and product manager for the first two generations of the Cray's multithreaded architecture. After a short 2 year "sabbatical" at Microsoft where he led a software group developing a nextgeneration virtual reality platform, he joined PNNL as the Director of the Center for Adaptive Supercomputer Software and Principal Investigator of a large DOD project in graph analytics. Mostly recently, Dr. Feo was VP of Engineering at Context Relevant.
Dr. Feo's research interests are parallel programming, graph algorithms, multithreaded architectures, functional languages, and performance studies. He has published extensively in these fields. He has held academic positions at UC Davis and is an adjunct faculty at Washington State University.
July 20, 2015  Fuli Yu: The excitement and challenge of genomic data analysis in the era of precision medicine
ABSTRACT: The emergence of multiple highthroughput datarich technologies capable of characterizing genotypes and phenotypes, ranging from the population to cellular levels, has presented a paradigm shift in biomedical research. The bottleneck in scientific productivity has shifted from data production to integrative data analysis and interpretation. Integrating largescale and highdimensional molecular, physiological, and phenotypical data sets (including transcriptome, epigenome, microbiome, metabalome, proteome, imaging data and medical records) that are collected in a longitudinal manner across multiple studies holds great promise for identifying causal pathways from health to disease. These studies can reveal fundamental mechanistic insights as well as provide personalized approaches for disease prevention and treatment. The overarching challenge now facing biomedical researchers is how to utilize computational approaches to integrate these largescale, highdimensional data sets and to consequently build new knowledge and hypothesis.
We have developed an ensemble pipeline  goSNAP  that integrates multiple variant callers and heterogeneous computational infrastructures (cluster, cloud and supercomputer facilities) to optimize the performance both computationally and scientifically. By exploiting a hybrid paradigm of combining heterogeneous computational infrastructures, we effectively balanced the scalability and cost model. Local cluster was used for routine background steps such as alignment and recalibration, with data being aggregated over a long period of time. We used both the cloud (DNAnexus) and a supercomputer (Oak Ridge National Laboratory) to substantially ease the CPU and IO intensive steps when highly parallelized processing is desired to have a reasonable timeframe. Our deployment in CHARGE has shown that we can reduce the time from >6 months to just a few weeks.
July 1, 2015  Greg Watson: Software Engineering for Science: Beyond the Eclipse Parallel Tools Platform
ABSTRACT: The Eclipse Parallel Tools Platform (PTP) project was started over 10 years ago with the goal of bringing best practices in software engineering to scientific computing. The results of the project have been mixed; we have seen adoption of Eclipse in many labs and academic institutions, and the PTP development environment has been downloaded over 1M times since records started being kept in 2012. However, we are still not seeing general use across the scientific computing community, and many negative perceptions of Eclipse still persist. In spite of the fact that a number of groups have their own Eclipsebased tools, we also haven't seen a high level of integration that was one of the original objectives of the project. Although software engineering practices have improved to some degree, there is still much room for improvement, particularly as the next generation of highly complex computing systems becomes available. This talk will discuss some key observations on the uptake of advanced development environments by the scientific computing community, and consider the factors that have influenced the adoption of PTP in particular. The presentation will then examine some areas that we believe would be beneficial for improving software engineering practices, as well as looking at some exciting possibilities for future research.
June 30, 2015  Torsten Hoefler: How fast will your application run at <next>scale? Static and dynamic techniques for application performance modeling
ABSTRACT: Many parallel applications suffer from latent performance limitations that may prevent them from utilizing resources efficiently when scaling to larger parallelism. Often, such scalability bugs manifest themselves only when an attempt to scale the code is actually being madea point where remediation can be difficult. However, creating analytical performance models that would allow such issues to be pinpointed earlier is so laborious that application developers attempt it at most for a few selected kernels, running the risk of missing harmful bottlenecks. We discuss dynamic techniques to generate performance models for program scalability to identify scaling bugs early and automatically. This automation enables a new set of parallel software development techniques. We demonstrate the practicality of this method with various realworld applications but also point out limitations of the dynamic approach. We then discuss a static analysis that establishes close provable bounds for the number of loop iterations and the scalability of parallel programs. While this analysis captures more loops then existing techniques based on the Polyhedral model, no analysis can count all loops statically. We conclude by briefly discussing how to combine these two approaches into an integrated framework for scalability and performance analysis.
BIOGRAPHY: Torsten is an Assistant Professor of Computer Science at ETH Zürich, Switzerland. Before joining ETH, he led the performance modeling and simulation efforts of parallel petascale applications for the NSFfunded Blue Waters project at NCSA/UIUC. He is also a key member of the Message Passing Interface (MPI) Forum where he chairs the "Collective Operations and Topologies" working group. Torsten won best paper awards at the ACM/IEEE Supercomputing Conference SC10, SC13, SC14, EuroMPI 2013, IPDPS 2015, and other conferences. He published numerous peerreviewed scientific conference and journal articles and authored chapters of the MPI2.2 and MPI3.0 standards. His research interests revolve around the central topic of "Performancecentric Software Development" and include scalable networks, parallel programming techniques, and performance modeling. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.
June 26, 2015  Edwin Garcia: Progress Towards a Microstructurally Resolved Porous Electrode Theory for Rechargeable Batteries
ABSTRACT: In high energy density, low porosity, lithiumion battery electrodes, the underlying microstructural characteristics control the macroscopic charge capacity, average lithiumion transport, and macroscopic resistivity of the cell, particularly at high electronic current densities and power densities. In this presentation, we report on progress towards the development of a combined numerical+analytical framework to describe the effect of particle morphologies and its processinginduced spatial distribution on the macroscopic and positiondependent performance. Here, by spatially resolving the electrochemical fields, the effect of particle size polydispersity on the galvanostatic behavior is analyzed. We detail such effects in structures of controlled electrode compaction and polydispersity on the macroscopic effective transport properties and discuss its impact on the macroscopic galvanostatic response for existing and emerging energy storage devices. The framework presented herein enables to establish relations that combine the tortuosity and reactivity constitutive properties of the individual components. Macroscopic tortuosityporosity relations for mixtures of porous particle systems of widely different length scales and wellknown individual tortuosity constitutive equations are combined into selfconsistent macroscopic expressions, in agreement with recently reported empirical measures.
June 22, 2015  Mohamed Wahib : Scalable and Automated GPU Kernel Transformations in Production Stencil Applications
We present a scalable method for exposing and exploiting hidden localities in production GPU stencil applications. Exploiting interkernel localities is essentially the following: find the best permutation of kernel fusions that would minimize redundant memory accesses. To achieve this, we first expose the hidden localities by analyzing interkernel data dependencies and orderofexecution. Next, we use a scalable search heuristic that relies on a lightweight performance model to identify the best candidate kernel fusions. Experiments with two realworld applications prove the effectiveness of manual kernel fusion. To make kernel fusion a practical choice, we further introduce an endtoend method for automated transformation. A CUDAtoCUDA transformation collectively replaces the userwritten kernels by autogenerated kernels optimized for data reuse. Moreover, the automated method allows us to improve the search process by enabling kernel fission and thread block tuning. We demonstrate the practicality and effectiveness of the proposed endtoend automated method. With minimum intervention from the user, we improved the performance of six applications with speedups ranging between 1.12x to 1.76x.
BIOGRAPHY: Mohamed Wahib is currently a postdoctoral researcher in the "HPC Programming Framework Research Team" at RIKEN Advanced Institute for Computational Science (RIKEN AICS). He joined RIKEN AICS in 2012 after years at Hokkaido University, Japan, where he received a Ph.D. in Computer Science in 2012. Prior to his graduate studies, he worked as a researcher at Texas Instruments (TI) R&D for four years.
June 12, 2015  Saurabh Hukerikar : Introspective Resilience for Exascale High Performance Computing Systems
ABSTRACT: Future exascale High Performance Computing (HPC) systems will be constructed from VLSI devices that will be less reliable than those used today, and faults will become the norm, not the exception. Furthermore, the Mean Time to Failure (MTTF) of the system scales inversely to the number of components in the system and therefore faults and resultant system level failures will increase, as systems scale in terms of the number of processor cores and memory modules used. This will pose significant problems for system designers and programmers, who for halfacentury have enjoyed an execution model that assumed correct behavior by the underlying computing system. However, not every error detected needs to result in catastrophic failure. Many HPC applications are inherently fault resilient but lack convenient mechanisms to express their resilience features to the execution environments which are designed to be fault oblivious.
Dr. Hukerikar will present research conducted as part of his PhD dissertation which proposes an execution model based on the notion of introspection. A set of resilience oriented language extensions was developed, which facilitate the incorporation of fault resilience as intrinsic property of the scientific application codes. These are supported by a compiler infrastructure and a runtime system that reasons about the context and significance of faults to the outcome of the application execution. The compiler infrastructure was extended to demonstrate an application level methodology for fault detection and correction that is based on redundant multithreading (RMT). An introspective runtime framework was also developed that continuously observes and reflects upon the platform level fault indicators to assess the vulnerability of the system's resources. The introspective runtime system provides a unified execution environment that reasons about the implications of resource man
agement actions for the resilience and performance of the application processes. Results, which cover several high performance computing applications and different fault types and distributions, demonstrate that a resilience aware execution environment is important to solve the most demanding computational challenges on future extreme scale HPC systems.
*Saurabh Hukerikar is a candidate for a postdoctoral position with the Computer Science Research Group. He recently completed his PhD from the Ming Hsieh Department of Electrical Engineering at the University of Southern California. He works with the Computational Systems and Technology Division at USC's Information Sciences Institute. His graduate work seeks to address the challenge of resilience for extremescale highperformance computing (HPC) systems. He received a MS in Electrical Engineering in 2010 and a MS in Computer Science (with emphasis on High Performance Computing and Simulations) in 2012 both from the University of Southern California
June 5, 2015  Vivek Sarkar : Runtime System Challenges for Extreme Scale Systems
ABSTRACT:
It is widely recognized that radical changes are to be expected in future HPC systems to address the challenges of extremescale computing. Specifically, they will be built using homogeneous and heterogeneous manycore processors with 100's to 1000's of cores per chip, their performance will be driven by parallelism (billionway parallelism for an exascale system), and constrained by energy and data movement. They will also be subject to frequent faults and failures. Unlike previous generations of hardware evolution, these Extreme Scale HPC systems will have a profound impact on future applications and their underlying software stack. The software challenges are further compounded by the addition of new application requirements that include, most notably, dataintensive computing and analytics.
The challenges across the entire software stack for Extreme Scale systems are driven by programmability, portability and performance requirements, and impose new requirements on programming models, languages, compilers, runtime systems, and system software. Focus is on the critical role played by Runtime Systems in enabling programmability in the upper layers of the software stack that interface with the programmer, and in enabling performance in the lower levels of the software stack that interface with the operating system and hardware.
Examples of key runtime primitives being developed to address these challenges will be drawn from experiences in the Habanero Extreme Scale Software Research project which targets a wide range of homogeneous and heterogeneous manycore processors, as well as from the Open Community Runtime (OCR) system being developed in the DOE XStack program. Background material for this talk will also be drawn from the DARPA Exascale Software Study report from the DOE ASCAC study on Synergistic Challenges in DataIntensive Science and Exascale Computing. We would like to acknowledge the contributions of all participants in the Habanero project, the OCR project, and the DARPA and DOE studies.
BIOGRAPHY: Vivek Sarkar is Professor and Chair of Computer Science at Rice University. He conducts research in multiple aspects of parallel software including programming languages, program analysis, compiler optimizations and runtimes for parallel and high performance computer systems. He currently leads the Habanero Extreme Scale Software Research Laboratory at Rice University, and serves as Associate Director of the NSF Expeditions Center for DomainSpecific Computing. Prior to joining Rice in July 2007, Vivek was Senior Manager of Programming Technologies at IBM Research. His responsibilities at IBM included leading IBM's research efforts in programming model, tools, and productivity in the PERCS project during 2002 2007 as part of the DARPA High Productivity Computing System program. His prior research projects include the X10 programming language, the Jikes Research Virtual Machine for the Java language, the ASTI optimizer used in IBM's XL Fortran product compilers, the PTRAN automatic parallelization system, and profiledirected partitioning and scheduling of Sisal programs. In 1997, he was on sabbatical as a visiting associate professor at MIT, where he was a founding member of the MIT Raw multicore project. Vivek became a member of the IBM Academy of Technology in 1995, the E.D. Butcher Chair in Engineering at Rice University in 2007, and was inducted as an ACM Fellow in 2008. He holds a B.Tech. degree from the Indian Institute of Technology, Kanpur, an M.S. degree from University of WisconsinMadison, and a Ph.D. from Stanford University. Vivek has been serving as a member of the US Department of Energy's Advanced Scientific Computing Advisory Committee (ASCAC) since 2009.
May 27, 2015  Jeffrey K. Hollingsworth : Active Harmony: Making Autotuning Easy
ABSTRACT:
Active Harmony is an autotuning framework for parallel programs. In this talk, I will describe how the system makes it easy (sometimes even automatic) to create programs that can be autotuned. I will present examples from a few applications and programming languages. I will also discuss recent work we have been doing to provide support for autotuning programs with multiple (potentially conflicting) objectives such as performance and power.
BIOGRAPHY: Jeffrey K. Hollingsworth is a Professor of the Computer Science Department at the University of Maryland, College Park. He also has an appointment in the University of Maryland Institute for Advanced Computer Studies and the Electrical and Computer Engineering Department. He received his PhD and MS degrees in computer sciences from the University of Wisconsin. His research is in the area of performance measurement, autotuning, and binary instrumentation. He is Editor in chief of the journal Parallel Computing, was general chair of the SC12 conference, and is Vice Chair of ACM SIGHPC.
May 19, 2015  Mikolai Fajer : Effects of the SH2/SH3 Regulatory Domains on the Activation Transition of cSrc Kinases
ABSTRACT: The cSrc kinase is an important component in cellular signalling, and its activity is closely regulated by the SH2/SH3 domains. Using the swarmsoftrajectories string method, the transition from inactive to active conformations of the kinase domain are studied in the presence of the SH2/SH3 domains. The assembled, downregulated SH2/SH3 conformation closely resembles the activation transition of the kinaseonly domain. The reassembled and upregulated SH2/SH3 conformation preorients several side chains for their active state interactions, thus promoting the active state of the kinase.
BIOGRAPHY: Mikolai Fajer received his bachelor's degree in Physics and Chemistry from the University of Florida. He then went on to get his PhD working under Andy McCammon at the University of California, San Diego, working on enhanced sampling methods. Most recently he has been working as a postdoc for Benoit Roux at the University of Chicago, studying conformational transitions in biomolecular systems.
May 14, 2015  Brent Gorda : Lustre Keeping Pace with Compute and Intel's Continued Commitment
ABSTRACT: Brent will discuss the topic of "Lustre Keeping Pace with Compute and Intel's Continued Commitment." What is Intel's role in making sure data can safely move in and out of High Performance compute at extreme scale and at the speed of your network interface? Why do both Scientific Simulation environments and increasingly Big Data Applications need advanced parallel file systems such as Intel's hardened Lustre? How are partners now driving Lustre Innovation, alongside the Lustre Community? What improvements are coming in Lustre for Small File Performance, HSM, Fault Tolerance, Snapshot and Security? To get to Exascale Computing, what needs to change in I/O?
BIOGRAPHY: Brent Gorda is the General Manager of the High Performance Data Division at Intel. Brent cofounded and led Whamcloud, a startup focused on the Lustre technology which was subsequently acquired by Intel. A longtime member of the HPC community, Brent was at the Lawrence Livermore National Laboratory and responsible for the BlueGene P/Q architectures as well as many of the large IBbased cluster architectures in use among the NNSA DOE laboratories. Brent is the founder of the Student Cluster Competition, a worldwide event that showcase the power of parallel/cluster computing in the hands of students.
April 30, 2015  Dimitri Mavriplis : High Performance Computational Aerodynamics for Multidisciplinary Wind Energy and Aerospace Vehicle Analysis and Optimization
ABSTRACT: This talk will describe the development of a multisolver, overlapping adaptive mesh CFD capability that scales well on current high performance computing hardware with applications in aerospace vehicle analysis and design and complete wind farm simulations. The multisolver paradigm makes use of a near body unstructured mesh solver coupled with an adaptive Cartesian higherorder accurate offbody solver implemented within the SAMRAI framework. An overview of the multisolver software structure will be given, after which a description of the solution techniques used for the unstructured mesh multigrid solver component will be presented in more detail. Subsequently, the incorporation of a discrete adjoint capability will be described for multidisciplinary timedependent aerostructural problems, and results demonstrating the optimization of timedependent helicopter rotors will be shown. The talk will conclude with prospects for advanced discretizations and solvers
as we move towards the exascale era.
BIOGRAPHICAL INFORMATION: Dimitri Mavriplis is currently the Max Castagne Professor in Mechanical Engineering at the University of Wyoming. He obtained his Bachelor and Master's degrees in Mechanical Engineering from McGill University and his PhD in Mechanical and Aerospace Engineering from Princeton University. After graduation, he spent over 15 years at ICASE/NASA Langley where we worked on the development of unstructured mesh discretizations and solvers. In 2003 he joined the University of Wyoming where he leads a research group that focuses on HPC solver technology, adjoint methods for optimization and error control and highorder discretizations with applications in multidisciplinary wind energy and aerospace vehicle analysis and design optimization.
April 16, 2015  David Lecomber : Software Engineering for HPC  Experiences in Developing Software Tools for Rapidly Moving Targets
Code modernization is one of the hotter topics in HPC today  but modernization is about more than modern processors. I will consider how the modernization of software practices is making an impact in HPC  and some of the best practices we see out in the field amongst HPC developers. I will examine the challenges of software engineering to production and beyond from the perspective of engineering at Allinea, how we develop and test in a world of constant change, and the lessons learned along the way.
April 8, 2015  Kirk W. Cameron : Why highperformance systems need a little bit of LUC
In 1936, Harvard University sociologist Robert Morton wrote a paper entitled "The unanticipated consequences of purposive social action", where he described how government policies often result in both positive and negative unintended consequences. The lesson from Morton's work was that unexpected consequences in complex social systems, at the time relegated to theology or chance, should be evaluated scientifically.
Independent groups typically design the components of HPC systems. Hard disks, processors, memories, and boards are eventually combined with BIOSs, file systems, operating systems, communication libraries, and applications. Today's components also adapt automatically to local conditions to improve efficiency. For example, processors and memories can vary their frequencies in response to demand. Disks can vary their rotation speeds. BIOSs and OSs can adapt their scheduling policies for different use cases.
Since the performance effects of local hardware and software management are largely unknown, these potentially valuable features are often disabled in highperformance environments. And unfortunately, while we assume that disabling these features will have positive consequences, Morton teaches us that relegating performance behavior to chance is just as likely to result in negative consequences. For example, there is mounting evidence that when processors are fixed at the highest frequency (i.e., disabling dynamic frequency scaling), performance can worsen.
In this presentation, I will revisit the conventional wisdom that "faster is always better" for processor speeds in highperformance environments. In essence, through exhaustive experimentation, we can demonstrate quantitatively that slowing down CPU frequency can speed up performance as much as 50% for some I/O intensive applications. For the first time, we have identified the root cause of slowdowns at higher frequencies. I will describe how the LUC runtime system Limits the Unintended Consequences of processor speed in highperformance I/O applications. Our work also motivates the need to reject chance as an explanation of performance and revisit first principals so we can design systems that truly offer the highest performance.
BIO:
Kirk W. Cameron is Professor and Associate Department Head of Computer Science in the College of Engineering at Virginia Tech. The central theme of his research is to improve power and performance efficiency in high performance computing (HPC) systems and applications. More than half a million people in more than 160 countries have used his power management software. In addition to his research, his NSFfunded, 256node SeeMore kinetic sculpture of Raspberry Pi's was featured at SIGGRAPH 2014 in Vancouver, B.C. and is scheduled for multiple exhibitions in Washington D.C. and New York in 2015.
March 31, 2015  Keita Teranishi : Local Failure Local Recovery for large scale SPMD applications
As leadership class computing systems increase in complexity and component feature sizes continue to decrease, the ability of an application code to treat the system as a reliable digital machine diminishes. In fact, there is a growing concern in the high performance computing community that applications will have to explicitly manage resilience issues beyond the current practice of checkpoint/restart (C/R). In particular, the current system reaction to the loss of a single MPI process is to terminate all remaining processes and restart the application from the most recent checkpoint. This is suboptimal at scale because the recovery cost is not to the size of failures. We address this scaling issues using an emerging resilient computing model called Local Failure, Local Recovery (LFLR) that attempts to provide application developers with the ability to recover locally and continue application execution when a process is lost. In this talk, I will present our two ongoing efforts to enable scalable online application recovery, including the generalpurpose recovery heavily leveraging MPIULFM (fault tolerate MPI prototype), and recovery of stencilbased code using Cray's uGNI.
BIOGRAPHICAL INFORMATION: Keita Teranishi is a principal staff member of Scalable Modeling and Analysis Systems at Sandia National Laboratories in California. Before joining Sandia, he was involved in several projects in dense and sparse matrix libraries development at Cray Inc. His broad research interest in HPC includes application resilience, programming models, automatic performance tuning and numerical linear algebra. He holds an MS degree from University of Tennessee, Knoxville and a Ph.D. degree from Pennsylvania State University.
March 30, 2015  Sarah Osborn : Solutions Strategies for Stochastic Galerkin Discretizations of PDEs with Random Data
When using partial differential equations (PDEs) to model physical problems, the exact values of coefficients are often unknown. To obtain more realistic models, the coefficients are typically treated as random variables in an attempt to quantify uncertainty in the underlying problem. Stochastic Galerkin methods are used to obtain numerical solutions for these types of problems. These methods couple the stochastic and deterministic degreesoffreedom and yield a large system of equations that must be solved. A challenge in this method is solving the large system accurately and efficiently. Typically the system is solved iteratively and reconditioning strategies dictate the performance of the iterative method. The goal of this work is to improve solver efficiency by investigating preconditioning techniques and solver implementation details. The model problem considered is the diffusion problem with uncertainties in the diffusion coefficient. An algebraic multigrid preconditioner based on smoothed aggregation is presented with emphasis on the formulation of the model problem where the uncertain component has a nonlinear structure. Special consideration is given to the solution and proposed preconditioning strategy for improving performance on emerging architectures. Numerical results will be presented that illustrate the performance of the proposed preconditioner and implementation changes.
March 30, 2015  Emil Alexov : Revealing the molecular mechanism of SnyderRobinson Syndrome and rescuing it with small molecule binding
The SnyderRobinson Syndrome (SRS) (OMIM 300105) is a rare mental retardation disorder which is caused by missense mutations in the spermine sythase gene (SpmSyn). The SpmSyn encodes a protein, the spermine synthase (SMS) of 529 amino acids, which becomes dysfunctional in SRS patients due to specific missense mutations. Here we investigate, in silico and in vitro, the molecular effect of these amino acid substitutions causing SRS and demonstrate that almost always the mutations do not directly affect the functional properties of the SMS, but rather indirectly alter its wild type characteristics. A particular feature of SMS, which is shown to affect SMS functionality, is the formation of SMS homodimer. If the homodimer does not form, the activity of SMS is practically abolished. With this regard we identify several diseasecausing mutations that affect homodimerization of SMS and carry in silico screening to identify small molecules which binding to the destabilized homodimer can restore wild type homodimer affinity. The investigation resulted in extensive list of plausible stabilizers, among which we selected and tested 51 compounds experimentally for their capability to increase SMS mutant enzymatic activity. In silico analysis of the experimentally identified stabilizers suggested five distinctive chemical scaffolds. The identified chemical scaffolds are druglike and can serve as original starting points for development of lead molecules to further rescue the diseasecausing effects of the SnyderRobinson syndrome for which no efficient treatment exists up to now. Lab page URL: http://compbio.clemson.edu/
BIOGRAPHICAL INFORMATION: Dr. Emil Alexov is a Professor in the Department of Physics and Astronomy at Clemson University. He received his Ph.D. in Radiophysics and Electronics and his M.S. in Plasma Physics from Sofia University. He is currently a member of the American Physical Society, the Biophysical Society and the Protein Society. Dr. Alexov has been active in the National Institutes of Health, the National Scientific Foundation, among many other professional scientific activities
March 9, 2015  Mark Kim: GPUenabled Particle Systems for Visualization
Particle systems have a rich history in scientific visualization because of their practicality and versatility. And although particles are a useful tool for visualization, one difficulty is particle advection on an arbitrary surface. One solution is to parameterize the surface, which can be difficult to construct and utilize. Another method is to use a distance field and reproject particles onto the surface, which is a iterative search. Unfortunately, this iterative search is not optimal on the GPU.
In this talk, I will discuss our research on particle advection on surfaces on the GPU. As GPUs have become more powerful and accessible for general purposes, new techniques are required to fully utilize that performance. I will begin my talk with a discussion about some of the problems with particle systems on the GPU. In particular, I will discuss issues adapting multimaterial mesh extraction to the GPU. To address these issues, a new surface representation was chosen: the closest point embedding. The closest point embedding is a simple gridbased representation for arbitrary surfaces. To demonstrate the effectiveness of the closest point embedding, I will present two visualization techniques sped up on the GPU with the closest point embedding. First, the closest point embedding is used to speedup particle advection for multimaterial mesh extraction on the GPU. Second, unsteady flow visualization on arbitrary surfaces is simplified and sped up with the closest point embedding.
March 6, 2015  Sungahn Ko: Aided decisionmaking through visual analytics systems for big data
As technologies have advanced, various types of data are produced in science and industry, and extracting actionable information for making effective decisions becomes increasingly difficult for analysts and decision makers. The main reasons causing such difficulty are twofold; 1) the overwhelming amount of data prevents users from understand the data during exploration, and 2) the complexity of the multiple data characteristics (multivariate, spatial, temporal or/and networked) needs an integrated data presentation for finding any pattern, trend, or anomaly for decisionmaking. To overcome the analysts' information overload and enable effective visual presentation for efficient analysis and decision making, an interactive visual exploration and analysis environment are needed since traditional machine learning and big data analytics alone are insufficient. In this talk, I present visual analytics approaches for solving the big data problem and examples including spatiotemporal network data analysis, business intelligence, and steering of simulation pipelines.
February 3, 2015  Sergiy Kalnaus: Predictive modeling for electrochemical energy storage
Electrochemical energy storage devices have gained popularity and market penetration as means for providing energy/power source for consumer electronics, hybrid and fully electric vehicles (EV), and grid storage. Lithiumion secondary batteries represent the most promising and commercially viable segment, although lithium, lithiumair as well as intercalation systems based on other metals (sodium, aluminum) are being studied. Despite being adopted in many electrified powertrains (BMW ActiveE, Nissan Leaf, Ford Cmax Energi, etc), Liion batteries are still suffering from high manufacturing cost, low cycle life and safety issues. Modeling and simulation is a great tool for quantifying the response that otherwise cannot be assessed experimentally and for designing the strategies for better management of such systems. This talk will discuss the modeling approaches and results of computational studies of performance and safety of Liion batteries. The newly released Virtual Integrated Battery Environment (VIBE) is an integral part of the Open Architecture Software Framework designed within the CAEBAT (Computer Aided Engineering for Batteries) project. Coupled simulations and physics models within VIBE will be discussed.
January 29, 2015  Deepak Majeti: Portable Programming Models for Heterogeneous Platforms
ABSTRACT:
Heterogeneous architectures have become mainstream today and are found in a range of system from mobile devices to supercomputers. However, these architectures with their diverse architectural features pose several programmability challenges including handling datacoherence, managing computation and data communication, and mapping of tasks and data distributions. Consequently, application programmers have to deal with new lowlevel programming languages that involves nontrivial learning and training. In my talk, I will present two programming models that tackle some of the aforementioned challenges. The first model is the "Concord" programming model which provides a widely used Intel Thread Building Blocks like interface and targets integrated CPU+GPU architectures with semicoherent caches. This model also supports a wide set of C++ language features. The second model is "Heterogeneous Habanero C (H2C)", which is an implementation of the Habanero execution model for modern heterogeneous architectures. The novel features of H2C include highlevel language constructs that support automatic data layout, task mapping and data distributions. I will conclude the talk with performance evaluations of Concord and H2C, and propose future extensions to these models.
BIO:
Deepak is a 5th year graduate student at Rice University working with Prof. Vivek Sarkar. As part of his ongoing doctoral thesis, he is developing Heterogeneous HabaneroC (H2C). Deepak's areas of interest include programming models, compiler and runtime support for modern heterogeneous architectures. He was a major contributor to the Concord project as an intern at Intel Programming Systems Lab. He also worked on porting the Chapel programming language onto the HSA + XTQ architecture as an intern at AMD Research. Apart from research, Deepak loves to play sports which include soccer, badminton, squash and of course cricket.
January 6, 2015  David M. Weiss: Industrial Strength Software Measurement
Abstract:
In an industrial environment where software development is a necessary part of product development, measuring the state of software development and the attributes of the software becomes a crucial issue. For a company to survive and to make progress against its competition, it must have answers to questions such as "What is my customers' perception of the quality of the software in my products?", "How long will it take me to complete a new product or a new release of an existing one?" "What are the major bottlenecks in software production?" "How effective is a new technique or tool when introduced into the software development process?" The fate of the company, and of individuals within the company, may depend on accurate answers to these questions, so one must not only know how to obtain and analyze data to answer them, but also estimate how good one's answers are. In a large scale industrial software development environment, software measurement must be meaningful, automatable, nonintrusive, and feasible. Sources of data are diffuse, nonuniform, and nonstandard. The data itself are difficult to collect and interpret, and hard to compare across projects and organizations. Nonetheless, other industries perform such measurements as a matter of course, and software development organizations should as well. In this talk I will discuss the challenges of deciding what questions to ask, how to answer them, and what the impact of answering them is. I will illustrate with examples drawn from real projects, and from an existing and ongoing project, that details the state of software production in a large company, focusing on change data and how to use it to answer some of the questions posed in the preceding.
December 19, 2014  Soumi Manna: Evaluating the Performance of the Community Atmosphere Model at High Resolutions
Abstract:
The Community Atmosphere Model (CAM5) is one of the multiple component models in the Community Earth System Model (CESM). Recently, efforts have been focused on increasing the resolution of CAM5 to produce more accurate predictions. Additionally, new developments have enabled the use of mesh refinement in CAM5 through the HighOrder Method Modeling Environment (HOMME) dynamical core. These meshes allow for regions with extremely highresolution and produce a challenge to the current parallel domain decomposition algorithm.
In this project, we focused on analyzing the performance of HOMME on high and variable resolutions. We investigated the quality of domain decompositions produced by spacefilling curve algorithms for refined and unrefined meshes. Additionally, we evaluated performance metrics of realistic simulations on these meshes using the automatic trace analysis tool Scalasca. By correlating performance bottlenecks with geometric mesh information, we identified suboptimal properties of the domain decompositions and worked to address this behavior. Improving the quality of these decompositions will increase the scalability of simulations at these resolutions enhancing their scientific impact.
December 12, 2014  Jay Jay Billings: Eclipse ICE: ORNL's Modeling and Simulation User Environment
Abstract:
In the past several years ORNL modeling and simulation projects have experienced an increased need for interactive, graphical user tools. The projects in question span advanced materials, batteries, nuclear fuels and reactors, nuclear fusion, quantum computing and many others. They all require four tasks that are fundamental to modeling and simulation: creating input files, launching and monitoring jobs locally and remotely, visualizing and analyzing results, and managing data. This talk will present the Eclipse Integrated Computational Environment (ICE), a generalpurpose open source platform that provides integrated tools and utilities for creating rich user environments. It will cover both the robust, new infrastructure developed for modeling and simulation projects, such as new mesh editors and visualization tools, as well as the plugins for codes that are already supported by the platform and taking advantage of these features . The design philosophy of the project will also be presented as well as how the "nbody code problem" is solved by the platform. In addition to covering the services provided by the platform, this talk will also discuss ICE's place in the larger Eclipse ecosystem and how it became an Eclipse project. Finally, we will show how you can leverage it to accelerate your code deployment, use it to simplify your modeling and simulation project or get involved in the development.
Bio: Jay Jay Billings is a member of the research staff in the Computer Science Research group and leader of the ICE team.
December 12, 2014  Andrew Ross: Large scale Foundation Nurtured Collaboration
Abstract:
Software and data are crucial to almost all organizations. Open Source Software and Open Data are a vital part of this. This presentation provides a glimpse of why an open approach to software and data results in far more than just free software and data, as measured in terms of freedoms and acquisition price. Collaboration across groups within large organizations and between organizations is hard. The Eclipse Foundation is the NFL of open collaborations. It provides governance structure, technology infrastructure, and many services to facilitate collaboration. This presentation will briefly examine this and how working groups hosted by the Eclipse Foundation are enabling collaboration for domains such as Scientific R&D, Internet of Things (IoT), Location aware technologies, and more. The results are important; such as:

communication protocols like Paho for messaging between IoT devices

large scale distributed computing platforms such as GeoMesa and GeoTrellis

Data analysis and visualization found in ICE and DAWNSci

Advanced workflow and version control of data with tools such as GeoGig
From this presentation, audience members will get a brief taste of some of the collaboration opportunities, how to learn more, and how to get involved.
Bio: Andrew Ross is Director of Ecosystem Development at the Eclipse Foundation, a vendor neutral notforprofit. He is responsible for Eclipse's collaborative working groups including the LocationTech and Science groups which collaboratively develop software for locationaware systems and scientific research respectively. Prior to the Eclipse Foundation, Andrew was Director of Engineering at Ingres where his team developed advanced spatial support features for the relational database and many applications. Before Ingres, Andrew developed highly available Telecom solutions based on open source technologies for Nortel.
December 10, 2014  Beth Plale: The Research Data Alliance: Progress and Promise in Global Data Sharing
Abstract:
The Research Data Alliance is coming up on 1.5 years old along the road to realizing its vision of "researchers and innovators openly sharing data across technologies, disciplines, and countries to address the grand challenges of society." RDA has grown tremendously in the last 1.5 years from a handful of committed individuals to an organization with 1600 members in 70 countries. As one who was part of the small group that got RDA off the ground and remains deeply engaged, I will introduce the Research Data Alliance, take stock of it's impressive accomplishments to date, and highlight what I see as the opportunities it faces in realizing the grand goal RDA states so succinctly in its vision.
November 14, 2014  Taisuke Boku: Tightly Coupled Accelerators: A very low latency communication system on GPU cluster and parallel programming
Accelerating devices such as GPU, MIC or FPGA are one of the most powerful computing resources to provide high performance/energy and high performance/space ratio for wide area of large scale computational science. On the other hand, the complexity of programming combining various frameworks such as CUDA, OpenCL, OpenACC, OpenMP and MPI is growing and seriously degrades the programmability and productivity.
We have been developing XcalableMP (XMP) parallel programming language for distributed memory architecture for PC clusters to MPP, and enhancing its capability to include accelerating devices for heterogeneous parallel processing systems. XMP is a sort of PGAS language, and XMPdev and XMPACC are the extension for accelerating devices. On the other hand, we are also developing a new technology for internode GPU direct communication named TCA (Tightly Coupled Accelerators) architecture network from special hardware to the applications covered by this concept. Our ongoing project vertically integrate all these components toward the new generation of parallel accelerated computing.
In this talk, I will introduce our ongoing project which vertically integrates all these components toward the new generation of parallel accelerated computing.
BIO:
Prof. Taisuke Boku received Master and PhD degrees from Department of Electrical Engineering at Keio University. After his carrier as assistant professor in Department of Physics at Keio University, he joined to Center for Computational Sciences (former Center for Computational Physics) at University of Tsukuba where he is currently the deputy director, the HPC division leader and the system manager of supercomputing resources. He has been working there more than 20 years for HPC system architecture, system software, and performance evaluation on various scientific applications. In these years, he has been playing the central role of system development on CPPACS (ranked as number one in TOP500 in 1996), FIRST (hybrid cluster with gravity accelerator), PACSCS (bandwidthaware cluster) and HAPACS (highdensity GPU cluster) as the representative supercomputers in Japan. He also contributed to the system design of K Computer as a member of architecture design working group in RIKEN and currently a member of operation advisory board of AICS, RIKEN. He received ACM Gordon Bell Prize in 2011. His recent research interests include accelerated HPC systems and direct communication hardware/software for accelerators in HPC systems based on FPGA technology.
November 13, 2014  Eric Lingerfelt: Accelerating Scientific Discovery with the Bellerophon Software System
Abstract:
We present an overview of a software system, Bellerophon, built to support a productionlevel HPC application called CHIMERA, which simulates the temporal evolution of corecollapse supernovae. Developed over the last 5 years at ORNL, Bellerophon enables CHIMERA's geographically dispersed team of collaborators to perform job monitoring and realtime data analysis from multiple supercomputing resources, including platforms at OLCF, NERSC, and NICS. Its ntier architecture provides an encapsulated, endtoend software solution that enables the CHIMERA team to quickly and easily access highly customizable animated and static views of results from anywhere in the world via a webdeliverable, crossplatform desktop application. Bellerophon has quickly evolved into the CHIMERA team's de facto work environment for analysis, artifact management, regression testing, and other workflow tasks. We will also present plans to expand utilization and encourage adoption by generalizing the system for new HPC applications and domains.
Bio:
Eric Lingerfelt is a technical staff member and software engineer in the ORNL Computer Science and Mathematics Division's Computer Science Research Group. Mr. Lingerfelt specializes in developing ntier software systems with webdeliverable, highlyinteractive clientside applications that allow users to generate, access, visualize, manipulate, and share complex sets of data from anywhere in the world. For over a decade, he has designed, developed, and successfully delivered multiple software systems to the US Department of Energy and other customers in the fields of nuclear astrophysics, Big Bang cosmology, corecollapse supernovae, isotope sales and distribution, environmental science, nuclear energy, theoretical nuclear science, and the oil and gas industry. He is a 2011 ORNL Computing and Computational Sciences Directorate Distinguished Contributor and the recipient of the 2013 CSMD Most Significant Technical Contribution Award. Mr. Lingerfelt received his B.S. in Mathematics and Physics from East Tennessee State University in 1998 and his M.S. in Physics from the University of Tennessee in 2002.
November 12, 2014  John Springer: Discovery Advancements Through Data Analytics
Abstract:
The Purdue Discovery Advancements Through Analytics (D.A.T.A.) Laboratory seeks to address the computational challenges surrounding data analytics in the life, physical, and social sciences by focusing on the development and optimization of parallel codes that perform analytics. They complement these efforts by also examining the aspects of data analytics related to user adoption as well as the best practices pertaining to the management of associated metadata. In this seminar, the lead investigator in the D.A.T.A. Lab, Dr. John Springer, will discuss the lab¹s past and current efforts and will introduce the lab's planned activities.
Bio:
John Springer is an Associate Professor in Computer and Information Technology at Purdue University and the Lead Scientist for High Performance Data Management Systems at the Bindley Bioscience Center at Discovery Park. Dr. Springer's discovery efforts focus on distributed and parallel computational approaches to data integration and analytics, and he serves as the leader of the Purdue Discovery Advancements Through
Analytics (D.A.T.A.) Laboratory.
November 10, 2014  Christopher Rodrigues: HighLevel AcceleratorStyle Programming of Clusters with Triolet
Container libraries are popular for parallel programming due to their simplicity. Programs invoke library operations on entire containers, relying on the library implementation to turn groups of operations into efficient parallel loops and communication. However, their suitability for parallel programming on clusters has been limited, due to having a limited repertoire of parallel algorithm implementations under the hood.
In this talk, I will present Triolet, a highlevel functional language for using a cluster as a computational accelerator. Triolet improves upon the generality of prior distributed container library interfaces by separating concerns of parallelism, loop nesting, and data partitioning. I will discuss how this separation is used to efficiently decompose and communicate multidimensional array blocks, as well as to generate irregular loop nests from computations with variablesize temporary data. These loopbuilding algorithms are implemented as library code. Triolet's compiler inlines and specializes library calls to produce efficient parallel loops. The resulting code often performs comparably to handwritten C.
For several computeintensive loops running on a 128core cluster (with 8 nodes and 16 cores per node), Triolet performs significantly faster than sequential C code, with performance ranging from slightly faster to 4.3× slower than manually parallelized C code. Thus, Triolet demonstrates that a library of container traversal functions can deliver clusterparallel performance comparable to manually parallelized C code without requiring programmers to manage parallelism. Triolet carries lessons for the design of runtimes, compilers, and libraries for parallel programming using container APIs.
BIO:
Christopher Rodrigues got his Ph.D. in Electrical Engineering at the University of Illinois. He is one of the developers of the Parboil GPU benchmark suite. A computer architect by training, he has chased parallelism up the software stack, having worked on alias and dependence analysis, parallel programming for GPUs, statically typed functional language compilation, and the design of parallel libraries. He is interested in reducing the pain of writing and maintaining highperformance parallel code.
November 3, 2014  Benjamin Lee: Statistical Methods for HardwareSoftware CoDesign
Abstract: To pursue energyefficiency, computer architects specialize and coordinate design across the hardware/software interface. However, coordination is expensive, with high nonrecurring engineering costs that arise from an intractable number of degrees of freedom. I present the case for statistical methods to infer regression models, which provide tractability for complex design questions. These models estimate performance and power as a function of hardware parameters and software characteristics to permit coordinated design. For example, I show how to coordinate the tuning of sparse linear algebra with the design of the cache and memory hierarchy. Finally, I describe ongoing work in using logistic regression to understand the root causes of performance tails and outliers in warehousescale datacenters.
BIO: Benjamin Lee is an assistant professor of Electrical and Computer Engineering at Duke University. His research focuses on scalable technologies, powerefficient architectures, and highperformance applications. He is also interested in the economics and public policy of computation. He has held visiting research positions at Microsoft Research, Intel Labs, and Lawrence Livermore National Lab. Dr. Lee received his B.S. in electrical engineering and computer science at the University of California, Berkeley and his Ph.D. in computer science at Harvard University. He did postdoctoral work in electrical engineering at Stanford University. He received an NSF Computing Innovation Fellowship and an NSF CAREER Award. His research has been honored as a Top Pick by IEEE Micro Magazine and has been honored twice as Research Highlights by Communications of the ACM.
October 28, 2014  Robinson Pino: New Program Directions for Advanced Scientific Computing Research (ASCR)
October 24, 2014  Qingang Xiong: Computational Fluid Dynamics Simulation of Biomass Fast Pyrolysis  From Particle Scale to Reactor Scale
Abstract: Fast pyrolysis, a prominent thermochemical conversion approach to produce biooil from biomass, has attracted increased interest. However, the fundamental mechanisms of biomass fast pyrolysis are still poorly understood and the design, operation and optimization of pyrolyzers are far from satisfactory because of the characteristics that complicated multiphase flows are coupled with complex devolatilization processes. Computational fluid dynamics (CFD) is a powerful tool to investigate the underlined mechanisms of biomass fast pyrolysis and help optimize efficient pyrolyzers. In this presentation, I will describe my postdoctoral work on CFD of biomass fast pyrolysis at both particle scale and reactor scale. For the particlescale CFD, the lattice Boltzmann method is used to describe the flow and heat transfer processes. The intraparticle gas flow is modeled by the Darcy law. A lumped multistep reaction kinetics is employed to model the biomass decomposition. Through the particlescale CFD, detailed information on the evolution of a biomass particle is obtained. The velocity, temperature, and species mass fraction inside and surrounding the particles are presented. The evolutions of particle shape and density are monitored. For the reactorscale CFD, we use the socalled multifluid model to simulate the multiphase hydrodynamics, in which all phases are treated as interpenetrating continua. Volumefraction based mass, momentum, energy, and species conservation equations are employed to describe the density, velocity, temperature and mass fraction fields. Various submodels are used to close the conservation equations. Using this model, fluidizedbed and auger reactors are modeled. Parametric and sensitivity studies on the effects of operating conditions, devolatilization schemes, and submodel selections are investigated. It is expected that these multiscale CFD simulation will contribute significantly to the accuracy improvement of industrial reactor modeling for biomass fast pyrolysis. Finally, I will discuss on some of my ideas about the future directions in the multiscale CFD simulation of biomass thermochemical conversion.
Biography: Dr. Qingang Xiong is a postdoctoral research associate in the Department of Mechanical Engineering, Iowa State University. Dr. Xiong obtained his Ph.D. in Chemical Engineering from Institute of Process Engineering, Chinese Academy of Sciences in 2011. After his graduation, Dr. Xiong went to the University of Heidelberg, Germany, as a software engineer for half year to conduct GPUbased high performance computing of astrophysics. Dr. Xiong's research areas are computational fluid dynamics, CPU and GPUbased parallel computing, heat and mass transfer, and biomass thermochemical conversion. Dr. Xiong has published more 20 scientific papers and given more than 15 conference presentations. Dr. Xiong serves as editorial board member for several journals and chair in international conferences.
October 23, 2014  Liang Zhou: Multivariate Transfer Function Design
Visualization and exploration of volumetric datasets has been an active area of research for over two decades. During this period, volumetric datasets used by domain users have evolved from univariate to multivariate. The volume datasets are typically explored and classified via transfer function design and visualized using direct volume rendering. To improve classification results and to enable the exploration of multivariate volume datasets, multivariate transfer functions emerge. In this talk, we describe our research on multivariate transfer function design. To improve the classification of univariate volumes, various onedimensional (1D) or twodimensional (2D) transfer function spaces have been proposed; however, these methods work on only some datasets. We propose a novel transfer function method that provides better classifications by combining different transfer function spaces. Methods have been proposed for exploring multivariate simulations; however, these approaches are not suitable for complex realworld datasets and may be unintuitive for domain users. To this end, we propose a method based on userselected samples in the spatial domain to make complex multivariate volume data visualization more accessible for domain users. However, this method still requires users to finetune transfer functions in parameter space transfer function widgets, which may not be familiar to them. We therefore propose GuideME, a novel sliceguided semiautomatic multivariate volume exploration approach. GuideME provides the user, an easytouse, slicebased user interface that suggests the feature boundaries and allows the user to select features via click and drag, and then an optimal transfer function is automatically generated by optimizing a response function. Throughout the exploration process, the user does not need to interact with the parameter views at all. Finally, realworld multivariate volume datasets are also usually of large size, which is larger than the GPU memory and even the main memory of standard work stations. We propose a rayguided outofcore, interactive volume rendering and efficient query method to support large and complex multivariate volumes on standard work stations.
October 20, 2014  John Schmisseur: New HORIzONS: A Vision for Future Aerospace Capabilities within the University of Tennessee
Recently, issues of national interest including the planned DoD Pivot to the Pacific and assured large payload access to space have renewed commitment to the development of highspeed aerospace systems. As a result, many agencies, including the Air Force, are exploring new technology systems to facilitate operation in the hypersonic flight regime. One facet of the Air Force strategy in this area has been a reemphasis of hypersonic testing capabilities at the Arnold Engineering Development Complex (AEDC) and the establishment of an Air Force Research Laboratory scientific research group colocated at the complex. These recent events provide an opportunity for the University of Tennessee to support the Air Force and other agencies in the realization of planned highspeed capabilities while simultaneously establishing a precedent for the integration of contributions across the UT system.
The HORIzON center (Highspeed Original Research & InnovatiON) at the University of Tennessee Space Institute (UTSI) has been established to address the current, intermediate and strategic challenges faced by national agencies in the development of highspeed/hypersonic capabilities. Specifically, the center will foster the development of worldclass basic research capabilities in the region surrounding AEDC, create a culture of discovery and innovation integrating elements from academia, government and small business, and take the lead in the development of a rational methodology for the integration of large scale empirical and numerical data sets within a digital environment.
Dr. Schmisseur's presentation will provide the background and motivation that has driven the establishment of the HORIzON center and highlight a few of the center's major research vectors. He will be visiting ORNL to explore how contributions from the DoE can be integrated within the HORIzON enterprise to support the achievement of our national goals in highspeed technology development.
October 14, 2014  Krishna Chaitanya Gurijala: ShapedBased Analysis
Shape analysis plays a critical role in many fields, especially in medical analysis. There has been substantial research performed for shape analysis in manifolds. On the contrary, shapebased analysis has not received much attention for volumetric data. It is not feasible to directly extend the successful manifold shape analysis methods, such as the heat diffusion, to volumes due to the huge computational cost. The work presented herein seeks to address this problem by presenting two approaches for shape analysis in volumes that not only capture the shape information efficiently but also reduce the computational time drastically.
The first approach is a cumulative approach and is called the Cumulative Heat Diffusion, where the heat diffusion is carried out by simultaneously considering all the voxels as sources. The cumulative heat diffusion is monitored by a novel operator called the Volume Gradient Operator, which is a combination of the wellknown LaplaceBeltrami operator and a datadriven operator. The cumulative heat diffusion is computed by considering all the voxels and hence is inherently dependent on the resolution of the data. Therefore, we propose a second approach which is a stochastic approach for shape analysis. In this approach the diffusion process is carried out by using tiny massless particles termed shapetons. The shapetons are diffused in a Monte Carlo fashion across the voxels for predefined distance (serves as single time step) to obtain the shape information. The direction of propagation for the shapetons is monitored by the volume gradient operator. The shapeton diffusion is
a novel diffusion approach and is independent of the resolution of the data. These approaches robustly extract features, objects based on shape.
Both shape analysis approaches are used in several medical applications such as segmentation, feature extraction, registration, transfer function design and tumor detection. This work majorly focuses on the diagnosis of colon cancer. Colorectal cancer is the second leading cause of cancer related mortality in the United States. Virtual colonoscopy is a viable noninvasive screening method, whereby a radiologist can explore a colon surface to locate and remove the precancerous polyps (protrusions/bumps on the colon wall). To facilitate an efficient colon exploration, a robust and shapepreserving colon flattening algorithm is presented using the heat diffusion metric which is insensitive to topological noise. The flattened colon surface provides effective colon exploration, navigation, polyp visualization, detection, and verification. In addition, the flattened colon surface is used to consistently register the supine and prone colon surfaces. Anatomical landmarks such as the
taeniae coli, flexures and the surface feature points are used in the colon registration pipeline and this work presents techniques using heat diffusion to automatically identify them
September 30, 2014  Stanley Osher: What Sparsity and l1 Optimization Can Do For You
Sparsity and compressive sensing have had a tremendous impact in science, technology, medicine, imaging, machine learning and now, in solving multiscale problems in applied partial differential equations, developing sparse bases for Elliptic eigenspaces. l1 and related optimization solvers are a key tool in this area. The special nature of this functional allows for very fast solvers: l1 actually forgives and forgets errors in Bregman iterative methods. I will describe simple, fast algorithms and new applications ranging from sparse dynamics for PDE, new regularization paths for logistic regression and support vector machine to optimal data collection and hyperspectral image processing. Credits: Stanley Osher, jointly with many others)
MORE ABOUT THE SPEAKER
Dr. Osher's awards and accomplishments are voluminous and exceptionally remarkable, just a few highlights of which include:
 Recently awarded the prestigious Gauss Prize, the highest honor in applied mathematics from the International Congress of Mathematicians.
 Named among the top 1 percent of the most frequently cited scholars in both mathematics and computer science between 2002 and 2012.
 Elected in 2009 to the American Academy of Arts and Sciences.
 Honored with the 2007 United States Association for Computational Mechanics (USACM) Computational and Applied Sciences Award.
 Elected in 2005 to the National Academy of Sciences.
 Received the 2005 Society for Industrial and Applied Mathematics (SIAM) Kleinman Prize for "outstanding research or other contributions that bridge the gap between mathematics and applications".
 Awarded the 2003 International Council for Industrial and Applied Mathematics (ICIAM) Pioneer Prize "for pioneering work introducing applied mathematical methods and scientific computing techniques to an industrial problem area or a new scientific field of applications".
 Appointed as an Alfred P. Sloan Fellow and a Fulbright Fellow.
The Gauss prize citation summarized Dr. Osher's many achievements by stating that, "Stanley Osher has made influential contributions in a broad variety of fields in applied mathematics. These include high resolution shock capturing methods for hyperbolic equations, level set methods, PDE based methods in computer vision and image processing, and optimization. His numerical analysis contributions, including the EngquistOsher scheme, TVD schemes, entropy conditions, ENO and WENO schemes and numerical schemes for HamiltonJacobi type equations have revolutionized the field. His level set contributions include new level set calculus, novel numerical techniques, fluids and materials modeling, variational approaches, high codimension motion analysis, geometric optics, and the computation of discontinuous solutions to HamiltonJacobi equations; level set methods have been extremely influential in computer vision, image processing, and computer graphics. In addition, such new methods have motivated some of the most fundamental studies in the theory of PDEs in recent years, completing the picture of applied mathematics inspiring pure mathematics."
September 11, 2014  Jeffrey Willert: Increased Efficiency and Functionality inside the MomentBased Accelerated Thermal Radiation Transport Algorithm
Recent algorithm design efforts for thermal radiation transport (TRT) have included the application of "MomentBased Acceleration" (MBA). These MBA algorithms achieve accurate solutions in a highly efficient manner by moving a large portion of the computational effort to a nonlinearly consistent loworder (reduced phase space) domain.
In this talk I will discuss recent improvements/advancements of the MBATRT algorithm. We explore the use of Anderson Acceleration to solve the nonlinear loworder system as a replacement to a more traditional JacobianFree NewtonKrylov solver. Additionally, the MBATRT algorithm has struggled when error from Monte Carlo calculations builds up over several time steps. This error often corrupts the loworder system and may prevent convergence of the nonlinear solver. We attempt to remedy this by implementing a "Residual" Monte Carlo algorithm in which the stochastic error is greatly reduced for the same or less computational cost. We conclude with a discussion of areas of future work.
September 9, 2014  Swen Boehm: STCI  A scalable approach for tools and runtimes
The system community to is required to provide scalable and resilient communication substrates and runtime infrastructures by the everincreasing complexity and scale of highperformance computer (HPC) systems and parallel scientific applications. Two system research efforts will be presented, focusing on adaptation and customization of HPC runtimes as well as the usability of such systems. The Scalable runTime Component Infrastructure (STCI) will be introduced, a modular library that enables the implementation of new scalable and resilient HPC runtime systems. Its unique modular architecture eases the adaption to a particular HPC system. Additionally, STCI is based on the concept of "agents", which allows to further customize runtime services. For instance, STCI's customizability was recently utilized to implement an MPMD style execution model on top of STCI. Finally, "librte," will be presented: a unified runtime abstraction API that aims at improving the usability of HPC systems by providing an abstraction to various runtime systems such as Cray ALPS, PMI, ORTE and STC. "librte" is used by the Universal Common Communication Substrate (UCCS) and provides an simple and welldefined interface to tool developers.
September 9, 2014  Ewa Deelman: Science Automation with the Pegasus Workflow Management System
Abstract sent on behalf of the speaker:
Scientific workflows allow scientists to declaratively describe potentially complex applications that are composed of individual computational components. Workflows also include a description of the data and control dependencies between the components. This talk will describe example workflows in various science domains including astronomy, bioinformatics, earthquake science, gravitationalwave physics, and others. It will examine the challenges faced by workflow management systems when executing workflows in distributed and highperformance computing environments. In particular, the talk will describe the Pegasus Workflow Management System developed at USC/ISI. Pegasus bridges the scientific domain and the execution environment by automatically mapping highlevel workflow descriptions onto distributed resources. It locates the input data and computational resources necessary for workflow execution. It also restructures the workflow for performance and reliability reasons. Pegasus can execute workflows on a laptop, a campus cluster, grids, and clouds. It can handle workflows with a single task or millions of tasks and has been used to manage workflows accessing and generating TeraBytes of data. The talk will describe the capabilities of Pegasus and how it manages heterogeneous computing environments.
BIO FROM SPEAKER:
Ewa Deelman is a Research Associate Professor at the USC Computer Science Department and the Assistant Director of Science Automation Technologies at the USC Information Sciences Institute. Dr. Deelman's research interests include the design and exploration of collaborative, distributed scientific environments, with particular emphasis on workflow management as well as the management of large amounts of data and metadata. In 2007, Dr. Deelman edited a book: "Workflows in eScience: Scientific Workflows for Grids", published by Springer. She is also the founder of the annual Workshop on Workflows in Support of LargeScale Science, which is held in conjunction with the Super Computing conference. In 1997 Dr. Deelman received her PhD in Computer Science from the Rensselaer Polytechnic Institute.
August 29, 2014  C. David Levermore: Coarsening of Particle Systems
ABSTRACT FROM SPEAKER:
Each particle in a simulation of a system of particles usually represents a huge number of real particles. We present a framework for constructing the dynamics for a socalled coarsened system of simulated particles. We build an approximate solution to the Liouville equation for the original system from the solution of an equation for the phasespace density of a smaller system. We do this with a Markov approximation within a MoriZwanzig formalism based upon a reference density. We then identify the evolution equation for the reduced phasespace density as the forward Kolmogorov equation of a Markov process. The original system governed by deterministic dynamics is then simulated with the coarsened system governed by this Markov process. Both Monte Carlo (MC) and molecular dynamics (MD) simulations can be view from this framework. More generally, the reduced dynamics can have elements of both MC and MD.
August 21, 2014  Quan Long: Laplace method for optimal Bayesian experimental design with applications in impedance tomography and seismic source inversion
Abstract sent on behalf of the speaker:
Laplace method is a widely used method to approximate an integration in statistics. We analyze this method in the context of optimal Bayesian experimental design and extend this method from the classical scenario, where parameters can be completely determined by the experiment, to the scenarios where an unidentifiable parametric manifold exists. We show that by carrying out this approximation the estimation of the expected KullbackLeibler divergence can be significantly accelerated. The developed methodology has been applied to the optimal experimental design of impedance tomography and seismic source inversion.
August 18, 2014  Pierre Gremaud: Impedance boundary conditions for flows on networks
Abstract sent on behalf of the speaker:
From hemodynamics to engineering applications, many flow problems are solved on networks. For feasibility reasons, computational domains are often truncated and outflow conditions have to be prescribed at the end of the domain under consideration.
We will show how to efficiently compute the impedance of specific networks and how to use this information as outflow boundary condition. The method is based on linearization arguments and Laplace transforms. The talk will focus on hemodynamics applications but we will indicate how to generalize the approach.
July 23, 2014  Frédérique LaurentNegre: High order moment methods for the description of spray: mathematical modeling and adapted numerical methods
Abstract sent on behalf of the speaker:
We consider a twophase flow constituted of a dispersed phase of liquid droplets (a spray) in a gas flow. This type of flow occurs in many applications, such as twophase combustion or solid propulsion. The spray is then characterized by its distribution in size and velocity, which satisfies a Boltzmanntype equation. As an alternative to Lagrangian methods that are commonly used for the numerical simulations, we have developed Eulerian models that can account for the polydisperse character of the sprays. They use moments in size and velocity of the distribution on fixed intervals of droplet size. These moments represent the number and the mass or the amount of surface area, the momentum ... of all droplets of a given size range. However, the space in which the moment vectors live becomes complex when high order moments are considered. A key point of numerical methods is then to ensure that the moment vector will stay in this space. We study here some mathematical models derived from the kinetic model as well as highorder numerical methods specifically developed to preserve the moment space.
July 22, 2014  Christos Kavouklis: Numerical Solution of the 3D Poisson Equation with the Method of Local Corrections
Abstract sent on behalf of the speaker:
We present a new version of the Method of Local Corrections; a low communications algorithm for the numerical solution of the free space Poisson's equation on 3D structured grids. We are assuming a decomposition of the fine computational domain (which contains the global right hand side  charge) into a set of small disjoint cubic patches (e.g. of size 33^3). The Method of Local Corrections comprises three steps where Mehrstellen discretizations of the Laplace operator are employed; (i) A loop over the fine disjoint patches and the computation of local potentials on sufficiently large extensions of theirs (downward pass) (ii) An inexpensive global Poisson solve on the associated coarse domain with right hand side computed by applying the coarse mesh Laplacian to the local potentials of step (i) and (iii) A correction of the local solutions computed in step (i) on the boundaries of the fine disjoint patches based on interpolating the global coarse solution and a propagation of the corrections in the patch interiors via local Dirichlet solves (upward pass). Local solves in the downward pass and the global coarse solve are performed utilizing the domain doubling algorithm of Hockney. For the local solves in the upward pass we are employing a standard DFT Dirichlet Poisson solver. In this new version of the Method of Local Corrections we take into consideration the local potentials induced by truncated Legendre expansions of degree P of the local charges (the original version corresponded to P=0). The result is an hp scheme that is P+1order accurate and involves only local communication. Specifically, we only have to compute and communicate the coefficients of local Legendre expansions (that is, for instance, 20 scalars per patch for expansions of degree P=3). Several numerical simulations are presented to illustrate the new method and demonstrate its convergence properties.
July 17, 2014  Kody John Hoffman Law: Dimensionindependent, likelihoodinformed (DILI) MCMC (Markov chain Monte Carlo) sampling algorithms for Bayesian inverse problems
July 1, 2014  Xubin (Ben) He: High Performance and Reliable Storage Support for Big Data
Abstract sent on behalf of the speaker:
Big data applications have imposed unprecedented challenges in data analysis, storage, organization and understanding due to their heterogeneity, volume, complexity, and high velocity. These challenges are for both computer systems researchers who investigate new storage and computational solutions to support fast and reliable access to large datasets and application scientists in various disciplines who exploit these datasets of vital scientific interest for knowledge discovery. In this talk, I will talk about my research in data storage and I/O systems, particularly in solidstate devices (SSDs) and erasure codes to provide cost effective solutions for big data management for high performance and reliability.
Bio:
Dr. Xubin He is a Professor and the Graduate Program Director of Electrical and Computer Engineering at Virginia Commonwealth University. He is also the Director of the Storage Technology and Architecture Research (STAR) lab. Dr. He received his PhD in Electrical and Computer Engineering from University of Rhode Island, USA in 2002 and both his MS and BS degrees in Computer Science from Huazhong University of Science and Technology, China, in 1997 and 1995, respectively. His research interests include computer architecture, reliable and high availability storage systems and distributed computing. He has published more than 80 refereed articles in prestigious journals such as IEEE Transactions on Parallel and Distributed Systems (TPDS), Journal of Parallel and Distributed Computing (JPDC), ACM Transactions on Storage, and IEEE Transactions on Dependable and Secure Computing (TDSC), and at various international conferences, including USENIX FAST, USENIX ATC, Eurosys, IEEE/IFIP DSN, IEEE IPDPS, MSST, ICPP, MASCOTS, LCN, etc. He is the general cochair for IEEE NAS'2009, program cochair for MSST'2010, IEEE NAS'2008 and SNAPI'2007. Dr. He has served as a proposal review panelist for NSF, various chair roles and committee members for many professional conferences in the field. Dr. He was a recipient of the ORAU Ralph E. Powe Junior Faculty Enhancement Award in 2004, the TTU Chapter Sigma Xi Research Award in 2010 and 2005, and TTU ECE Most Outstanding Teaching Faculty Award in 2010. He holds one U.S. patent. He is a senior member of the IEEE, a member of the IEEE Computer Society and USENIX.
June 18, 2014  Hari Krishnan: Enabling Collaborative DomainCentric Visualization and Analysis in High Performance Computing Environments
Abstract and Bio sent on behalf of the speaker:
Multiinstitutional interdisciplinary domain science teams are increasingly commonplace in modern high performance computing (HPC) environments. Visualization tools, such as VisIt and ParaView, have traditionally focused more on improving scalability, performance, and efficiency of algorithms over enabling ease of use and collaborative functionality that compliments the power of the HPC resources. In addition, visualization tools provide an algorithmbased infrastructure focusing on a diverse set of readers, plots, and operations rather than higher level domainspecific set of capabilities when providing solutions to the scientific community. This strategy yields a higher return on investment, but increases complexity for the user community.
As larger, more diverse teams of scientists become more common place they require applications tuned at providing the most use out of a heavily utilized and resource constrained distributed HPC environment. Standard methods of visualization and data sharing pose significant challenges detracting from users focus on scientific inquiry.
In this presentation I will highlight three new capabilities under development within VisIt to address these needs which enable domain scientists to refocus their efforts on more productive endeavors. These features include tailored visualization using a new PySide/PyQt infrastructure, a new parallel analysis framework supporting Python & R scripting, and a collaboration suite that allows sharing and communicating among a variety of display mediums from mobile devices to visualization clusters. The goal is to enhance the experience of domain scientists by streamlining their work environment, providing easy access to a complex set of resources, and enabling collaborations, sharing, and communication among a diverse team.
Bio:
Hari Krishnan graduated with his Ph.D. in computer science and works for the visualization and graphics group as a computer systems engineer at Lawrence Berkeley National Laboratory. His research focuses on scientific visualization on HPC platforms and Manycore architectures. He leads the development effort on several HPC related projects which include research on new visualization methods, optimizing scaling and performance on Cray machines, working on data model optimized I/O libraries and enabling a remote workflow services. He is also an active developer on several major open source projects which include VisIt, NiCE, H5hut, and has developed plugins for Fiji/ImageJ.
May 20, 2014  Weiran Sun: A Spectral Method for Linear HalfSpace Kinetic Equations
Abstract sent on behalf of the speaker:
Halfspace equations naturally arise in boundary layer analysis of kinetic equations. In this talk we will present a unified proof for the wellposedness of a class of linear half  space equations with general incoming data. We will also show a spectral method to numerically resolve these type of equations in a systematic way. Our main strategy in both analysis and numerics includes three steps: adding damping terms to the original halfspace equation, using an inf  sup argument and evenodd decomposition to establish the wellposedness of the damped equation, and then recovering solutions to the original half  space equation. The accuracy of the damped equation is shown to be quasioptimal and the numerical error of approximations to the original equation is controlled by that of the damped equation. Numerical examples are shown for the isotropic neutron transport equation and the linearized BGK equation. This is joint work with Qin Li and Jianfeng Lu.
May 14, 2014  Michael Bauer: Programming Distributed Heterogeneous Architectures with Logical Regions
Abstract and Bio sent on behalf of the speaker:
Modern supercomputers now encompass both heterogeneous processors and deep, complex memory hierarchies. Programming these machines currently requires expertise in an eclectic collection of tools (MPI, OpenMP, CUDA, etc.) that primarily focus on describing parallelism while placing the burden of data movement on the programmer. Legion is an alternative approach that provides extensive support for describing the structure of program data through logical regions. Logical regions can be dynamically partitioned into subregions giving applications an explicit mechanism for directly conveying information about locality and independence to the Legion runtime. Using this information, Legion automatically extracts task parallelism and orchestrates data movement through the memory hierarchy. Time permitting, we will discuss results from several applications including a port of S3D, a production combustion simulation running on Titan, the Department of Energy's current flagship supercomputer.
Bio:
Michael Bauer is a sixth year PhD student in computer science at Stanford University. His interests include the design and implementation of programming systems for supercomputers and distributed systems.
May 8, 2014  Jerry McMahan: Bayesian Inverse Problems for Uncertainty Quantification: Prediction with Model Discrepancy and a Verification Framework
Abstract sent on behalf of the speaker:
Recent work in uncertainty quantification (UQ) has made it feasible to compute the statistical uncertainties for mathematical models in physics, biology, and engineering applications, offering added insight into how the model relates to the measurement data it represents. This talk focuses on two issues related to the reliability of UQ methods for model calibration in practice. The first issue concerns calibration of models having discrepancies with respect to the phenomena they model when these discrepancies violate commonly employed statistical assumptions used for simplifying computation. Using data from a vibrating beam as a case study, I will illustrate how these discrepancies can limit the accuracy of predictive simulation and discuss some approaches for reducing the impact of these limitations. The second issue concerns verifying the accurate implementation of computational algorithms for solving inverse problems in UQ. In this context, verification is particularly important as the nature of the computational results makes detection of subtle implementation errors unlikely. I will present a collaboratively developed computational framework for verification of statistical inverse problem solvers and present examples of its use to verify the Markov Chain Monte Carlo (MCMC) based routines in the QUESO C++ library.
May 5, 2014  Abhishek Kumar: Multiscale modeling of polycrystalline material for optimized property
May 1, 2014  Eric Chung: Staggered Discontinuous Galerkin Methods
ABSTRACT FROM SPEAKER: In this talk, we will present the staggered discontinuous Galerkin methods. These methods are based on piecewise polynomial approximation on staggered grids. The basis functions have to be carefully designed, so that some compatibility conditions are satisfied. Moreover, the use of staggered grids bring some advantages, such as optimal convergence and conservation. We will discuss the basic methodologies and applications to wave propagation and fluid flows.
April 7, 2014  Tom Scogland: Runtime Adaptation for Autonomic Heterogeneous Computing
Heterogeneity is increasing at all levels of computing, certainly with the rise in general purpose computing with GPUs in everything from phones to supercomputers. More quietly it is increasing with the rise of NUMA systems, hierarchical caching, OS noise, and a myriad of other factors. As heterogeneity becomes a fact of life at every level of computing, efficiently managing heterogeneous compute resources is becoming a critical task. In order to make the problem tractable we must develop methods and systems to allow software to adapt to the hardware it finds within a given node at runtime. The goal is to make the complex functions of heterogeneous computing autonomic, handling load balancing, memory coherence and other performance critical factors in the runtime. This talk will discuss my research into this area, including the design of a worksharing construct for CPU and GPU resources in OpenMP and automated memory reshaping/remapping for locality.
Dr. Scogland is a candidate for a postdoctoral position with the Computer Science Research Group
April 4, 2014  Alex McCaskey: Effects of ElectronPhonon Coupling in SingleMolecule Magnet Transport Junctions Using a Hybrid Density Functional Theory and Model Hamiltonian Approach
Recent experiments have shown that junctions consisting of individual singlemolecule magnets (SMMs) bridged between two electrodes can be fabricated in threeterminal devices, and that the characteristic magnetic anisotropy of the SMMs can be affected by electrons tunneling through the molecule. Vibrational modes of the SMM can couple to electronic charge and spin degrees of freedom, and this coupling also influences the magnetic and transport properties of the SMM. The effect of electronphonon coupling on transport has been extensively studied in small molecules, but not yet for junctions of SMMs. The goals of this talk will be twofold: to present a novel approach for studying the effects of this electronphonon coupling in transport through SMMs that utilizes both density functional theory calculations and model Hamiltonian construction and analysis, and to present a software framework based on this hybrid approach for the simulation of transport across userdefined SMMs . The results of these simulations will indicate a characteristic suppression of the current at low energies that is strongly dependent on the overall electronphonon coupling strength and number of molecular vibrational modes considered.
Mr. McCaskey is a candidate for a graduate position in the Computer Science Research Group
March 26, 2014  Steven Wise: Convergence of a Mixed FEM for a CahnHilliardStokes System
Abstract and Bio sent on behalf of the speaker:
CoAuthors: Amanda Diegel and Xiaobing Feng
Abstract: In this talk I will describe a mixed finite element method for a modified CahnHilliard equation coupled with a nonsteady DarcyStokes flow that models phase separation and coupled fluid flow in immiscible binary fluids and diblock copolymer melts. I will focus both on numerical implementation issues for the scheme as well as the convergence analysis. The time discretization is based on a convex splitting of the energy of the equation. I will show that our scheme is unconditionally energy stable with respect to a spatially discrete analogue of the continuous free energy of the system and unconditionally uniquely solvable. We can show, in addition, that the phase variable is bounded in L^\infty(0,T,L^\infty) and the chemical potential is bounded in L\infty(0,T,L^2), unconditionally in both two and three dimensions, for any finite final time T. In fact the bounds in such estimates grow only (at most) linearly in T. I will prove that these variables converge with optimal rates in the appropriate energy norms in both two and three dimensions. Finally, I will discuss some extensions of the scheme to approximate solutions for diffuse interface flow models with large differences in density.
Bio:
Steven Wise is an associate professor of mathematics at the University of Tennessee. He specializes in fast adaptive nonlinear algebraic solvers for numerical PDE, numerical analysis, and scientific computing more broadly. Before coming to the University of Tennessee, he was a postdoc and visiting assistant professor of mathematics and biomedical engineering at the University of California, Irvine. He earned a PhD in engineering physics from the University of Virginia in 2003.
March 18, 2014  Zhiwen Zhang: A Dynamically BiOrthogonal Method for TimeDependent Stochastic Partial Differential Equation
We propose a dynamically biorthogonal method (DyBO) to study time dependent stochastic partial differential equations (SPDEs). The objective of our method is to exploit some intrinsic sparse structure in the stochastic solution by constructing the sparsest representation of the stochastic solution via a biorthogonal basis. It is wellknown that the KarhunenLoeve expansion minimizes the total mean squared error and gives the sparsest representation of stochastic solutions. However, the computation of the KL expansion could be quite expensive since we need to form a covariance matrix and solve a largescale eigenvalue problem. In this talk, we derive an equivalent system that governs the evolution of the spatial and stochastic basis in the KL expansion. Unlike other reduced model methods, our method constructs the reduced basis onthefly without the need to form the covariance matrix or to compute its eigendecomposition. We further present an adaptive strategy to dynamically remove or add modes, perform a detailed complexity analysis, and discuss various generalizations of this approach. Several numerical experiments will be provided to demonstrate the effectiveness of the DyBO method.
Bio:
Zhiwen Zhang is a postdoctoral scholar in the Department of Computing and Mathematical Sciences, California Institute of Technology. He graduated from the Department of Mathematical Sciences, Tsinghua University in 2011, where he was awarded the degree of Ph.D. in Applied Mathematics. From 2008 to 2009, he was studied in the University of Wisconsin at Madison as a visiting student. His research interests lie in the applied analysis and numerical computation of problems arising from quantum chemistry, wave propagation, porous media, cell evolution, Bayesian updating, stochastic fluid dynamics and random heterogeneous media.
March 4, 2014  David Seal: Beyond the Method of Lines Formulation: Building Spatial Derivatives into the Temporal Integrator
Abstract: Highorder solvers for hyperbolic conservation laws often fall under two disparate categories. On one hand, the method of lines formulation starts by discretizing the spatial variables, and then a system of ODEs is solved using an appropriate timeintegrator. On the other hand, LaxWendroff discretizations immediately convert Taylor series in time to discrete spatial derivatives. In this talk, we present generalizations of these methods including highorder discontinuous Galerkin (DG) methods based on multiderivative timeintegrators, as well as highorder finite difference weighted essentially nonoscillatory (WENO) methods based on the Picard Integral Formulation (PIF) of the conservation law. Multiderivative time integrators are extensions of RungeKutta and Taylor methods. They reduce the overall storage required for a RungeKutta method, and they introduce flexibility to the Taylor series in time methods by allowing for new coefficients to be used at various stages. In the multiderivative DG method, "modified fluxes'' are used to define highorder Riemann problems, which are similar to those defined in the generalized Riemann problem solvers incorporated in the Arbitrary DERivative (ADER) methods. The finite difference WENO method is based on a Picard Integral Formulation of the PDE, where we first integrate in time, and then work on discretizing the temporal integral. The present formulation is automatically mass conservative, and therefore it introduces the possibility of modifying finite difference fluxes for the purpose of accomplishing tasks such as positivity preservation, or reducing the number of expensive nonlinear WENO reconstructions. For now, we present results for a singlestep version of the PIFWENO method which lends itself to incorporating adaptive mesh refinement technology. Results for one and twodimensional conservation laws are presented, and they indicate that the new methods compete well with current state of the art technology.
February 21, 2014  Zhou Li: Harnessing highresolution mass spectrometry and highperformance supercomputing for quantitative characterization of a broad range of protein posttranslational modifications in a natural microbial community
Microbial communities populate and shape diverse ecological niches within natural environments. The physiology of organisms in natural consortia has been studied with community proteomics. However, little is known about how freeliving microorganisms regulate protein activities through posttranslational modifications (PTMs). Here, we harnessed highperformance mass spectrometry and supercomputing for identification and quantification of a broad range of PTMs (including hydroxylation, methylation, citrullination, acetylation, phosphorylation, methylthiolation, Snitrosylation, and nitration) in microorganisms. Using an E. coli proteome as a benchmark, we identified more than 5,000 PTM events of diverse types and a large number of modified proteins that carried multiple types of PTMs. We applied this demonstrated approach to profiling PTMs in two growth stages of a natural microbial community growing in the acid mine drainage environment. We found that the multitype, multisite protein modifications are highly prevalent in freeliving microorganisms. A large number of proteins involved in various biological processes were dynamically modified during the community succession, indicating that dynamic protein modification might play an important role in organismal response to changing environmental conditions. Furthermore, we found closely related, but ecologically differentiated bacteria harbored remarkably divergent PTM patterns between their orthologous proteins, implying that PTM divergence could be a molecular mechanism underlying their phenotypic diversities. We also quantified fractional occupancy for thousands of PTM events. The findings of this study should help unravel the role of PTMs in microbial adaptation, evolution and ecology.
February 14, 2014  Celia E. Shiau: Probing fishmicrobe interface for environmental assessment of clean energy
To preserve wildlife and natural resources for future generations, we face the grand challenge of effectively assessing and predicting the impact of current and future energy use. My overall goal is to probe the microbiome and hostmicrobe interface of fish populations, in order to evaluate environmental stress on aquatic life and resources. Current understanding of aquatic microbes in fresh and salt water is centered on freeliving bacteria (independent of a host). I will discuss my work on the experimentally tractable fish model (Danio rerio) that can be applied to investigate the interaction between microbiota, host health, and environmental toxicants (such as mercury and other metalloids), and the aims of my Liane Russell fellowship research program. The findings will provide a framework for studies of other fish species, leveraging advanced imaging, metagenomics, bioinformatics, and neutron scattering. The proposed study promises to inform the potential use of fish microbes to solve energy and environmental challenges, thereby providing means for critical assessment of global energy impact.
February 6, 2014  Susan Janiszewski: 3connected, clawfree, generalized netfree graphs are hamiltonian
Given a family $\mathcal{F} = \{H_1, H_2, \dots, H_k\}$ of graphs, we say that a graph is $\mathcal{F}$free if $G$ contains no subgraph isomorphic to any $H_i$, $i = 1,2,\dots, k$. The graphs in the set $\mathcal{F}$ are known as {\it forbidden subgraphs}. The main goal of this dissertation is to further classify pairs of forbidden subgraphs that imply a 3connected graph is hamiltonian. First, the number of possible forbidden pairs is reduced by presenting families of graphs that are 3connected and not hamiltonian. Of particular interest is the graph $K_{1,3}$, also known as the {\it claw}, as we show that it must be included in any forbidden pair. Secondly, we show that 3connected, $\{K_{1,3}, N_{i,j,0}\}$free graphs are hamiltonian for $i,j \ne 0, i+j \le 9$ and 3connected, $\{K_{1,3}, N_{3,3,3}\}$free graphs are hamiltonian, where $N_{i,j,k}$, known as the {\it generalized net}, is the graph obtained by rooting vertexdisjoint paths of length $i$, $j$, and $k$ at the vertices of a triangle. These results combined with previous known results give a complete classification of generalized nets such that clawfree, netfree implies a 3connected graph is hamiltonian.
January 30, 2014  Wei Guo: High order SemiLagrangian Methods for Transport Problems with Applications to Vlasov Simulations and Global Transport
Abstract and Bio sent on behalf of the speaker:
The semiLagrangian (SL) scheme for transport problems gains more and more popularity in the computational science community due to its attractive properties. For example, the SL scheme, compared with the Eulerian approach, allows extra large time step evolution by incorporating characteristics tracing mechanism, hence achieving great computational efficiency. In this talk, we will introduce a family of dimensional splitting high order SL methods coupled with high order finite difference weighted essentially nonoscillatory (WENO) procedures and finite element discontinuous Galerkin (DG) methods. By performing dimensional splitting, the multidimensional problem is decoupled into a sequence of 1D problems, which are much easier to solve numerically in the SL setting. The proposed SL schemes are applied to the Vlasov model arising from the plasma physics and the global transport problems based on the cubedsphere geometry from the operational climate model. We further introduce the integral defer correction (IDC) framework to reduce the dimensional splitting errors. The proposed algorithms have been extensively tested and benchmarked with classical problems in plasma physics such as Landau damping, two stream instability, KelvinHelmholtz instability and global transport problems on the cubedsphere. This is joint work with Andrew Christlieb, Maureen Morton, Ram Nair and JingMei Qiu.
January 28, 2014  Jeff Haack: Applications of computational kinetic theory
Abstract and Bio sent on behalf of the speaker:
Kinetic theory describes the evolution of a complex system of a large number of interacting particles. These models are used to describe systems where the characteristic scales for interaction between particles and characteristic length scales are similar. In this talk, I will discuss numerical computation of several applications of kinetic theory, including rarefied gas dynamics with applications towards reentry, kinetic models for plasmas, and a biological model for swarm behavior. As kinetic models often involve a high dimensional phase space as well as an integral operator modeling particle interactions, simulations have been impractical in many settings. However, recent advances in massively parallel computing are very well suited to solving kinetic models, and I will discuss how these resources are used in computing kinetic models and new difficulties that arise when computing on these architectures.
January 24, 2014  Roman Lysecky: Datadriven Design Methods and Optimization for Adaptable HighPerformance Systems
Abstract and Bio sent on behalf of the speaker:
Research has demonstrated that runtime optimization and adaptation methods can achieve performance improvement over designtime optimization system implementations. Furthermore, modern computing applications require a large degree of configurability and adaptability to operate on a variety of data inputs where the characteristic of the data inputs may change over time. In this talk, we highlight two runtime optimization methods for adaptable computing systems. We first highlight the use of runtime profiling and systemlevel performance and power estimation methods for estimating the speedup and power consumption of dynamically reconfigurable systems. We evaluate the accuracy and fidelity of the online estimation framework for dynamic configuration of computational kernels with goals of both maximizing performance and minimizing system power consumption. We further present an overview of the design framework and runtime reconfiguration methods supporting dataadaptable reconfigurable systems. Dataadaptable reconfigurable systems enable a flexible runtime implementation in which a system can transition the execution of tasks between different execution modalities, e.g., hardware and software implementations, while simultaneously continuing to process data during the transition.
Bio:
Roman Lysecky is an Associate Professor of Electrical and Computer Engineering at the University of Arizona. He received his B.S., M.S., and Ph.D. in Computer Science from the University of California, Riverside in 1999, 2000, and 2005, respectively. His research interests focus on embedded systems, with emphasis on embedded system security, nonintrusive system observation methods for insitu analysis of complex hardware and software behavior, runtime optimizations methods, and design methods for precisely timed systems with applications in safetycritical and mobile health systems. He was awarded the Outstanding Ph.D. Dissertation Award from the European Design and Automation Association (EDAA) in 2006 for New Directions in Embedded Systems. He received a CAREER award from the National Science Foundation in 2009 and four Best Paper Awards from the ACM/IEEE International Conference on HardwareSoftware Codesign and System Synthesis (CODES+ISSS), the ACM/IEEE Design Automation and Test in Europe Conference (DATE), the IEEE International Conference on Engineering of ComputerBased Systems (ECBS), and the International Conference on Mobile Ubiquitous Computing, Systems, Services (UBICOMM). He has coauthored five textbooks on VHDL, Verilog, C, C++, and Java programming. He is an inventor on one US patent. In 2008 and 2013, he received an award for Excellence at the Student Interface from the College of Engineering and the University of Arizona.
January 21, 2014  Tuoc Van Phan: Some Aspects in Nonlinear Partial Differential Equations and Nonlinear Dynamics
This talk contains two parts:
Part I: We discuss the ShigesadaKawasakiTeramoto system of crossdiffusion equations of two competing species in population dynamics. We show that if there are selfdiffusion in one species and no crossdiffusion in the other, then the system has a unique smooth solution for all time in bounded domains of any dimension. We obtain this result by deriving global W ^(1,p) –estimates of CalderónZygmund type for a class of nonlinear reactiondiffusion equations with selfdiffusion. These estimates are achieved by employing CaffarelliPeral perturbation technique together with a new twoparameter scaling argument.
Part II: We study a class of nonlinear Schrödinger equations in one dimensional spatial space with doublewell symmetric potential. We derive and justify a normal form reduction of the nonlinear Schrödinger equation for a general pitchfork bifurcation of the symmetric bound state. We prove persistence of normal form dynamics for both supercritical and subcritical pitchfork bifurcations in the timedependent solutions of the nonlinear Schrödinger equation over long but finite time intervals.
The talk is based on my joint work with Luan Hoang (Texas Tech University), Truyen Nguyen (University of Akron), and Dmitry Pelinovsky (McMaster University).
January 17, 2014  John Dolbow: Recent advances in embedded finite element methods
This seminar will present recent advances in an emerging class of embedded finite element methods for evolving interface problems in mechanics. By embedded, we refer to methods that allow for the interface geometry to be arbitrarily located with respect to the finite element mesh. This relaxation between mesh and geometry obviates the need for remeshing strategies in many cases and greatly facilitates adaptivity in others. The approach shares features with finitedifference methods for embedded boundaries, but within a variational setting that facilitates error and stability analysis.
We focus attention on a weighted form of Nitsche's method that allows interfacial conditions to be robustly enforced. Classically, Nitsche's method provides a means to weakly impose boundary conditions for Galerkinbased formulations. With regard to embedded interface problems, some care is needed to ensure that the method remains well behaved in varied settings ranging from interfacial configurations resulting in arbitrarily small elements to problems exhibiting large contrast. We illustrate how the weighting of the interfacial terms can be selected to both guarantee stability and to guard against illconditioning. Various benchmark problems for the method are then presented.
January 16, 2014  Aziz Takhirov: Numerical analysis of the flows in Pebble Bed Geometries
Flows in complex geometries intermediate between free flows and porous media flows occur in pebble bed reactors and other industrial processes. The Brinkman models have consistently shown that for simplified settings accurate prediction of essential flow features depends on the impossible problem of meshing the pores. We discuss a new model to understand the flow and its properties in these geometries.
January 13, 2014  Pablo Seleson: Bridging Scales in Materials with Mesoscopic Models
Complex systems are often characterized by processes occurring at different spatial and temporal scales. Accurate predictions of quantities of interest in such systems are many times only feasible through multiscale modeling. In this talk, I will discuss the use of mesoscopic models as a means to bridge disparate scales in materials. Examples of mesoscopic models include nonlocal continuum models, based on integrodifferential equations, that generalize classical continuum models based on partial differential equations. Nonlocal models possess length scales, which can be controlled for multiscale modeling. I will present two nonlocal models: peridynamics and nonlocal diffusion, and demonstrate how inherent length scales in these models allow to bridge scales in materials.
January 9, 2014  GungMin Gie: Motion of fluids in the presence of a boundary
In most practical applications of fluid mechanics, it is the interaction of the fluid with the boundary that is most critical to understanding the behavior of the fluid. Physically important parameters, such as the lift and drag of a wing, are determined by the sharp transition the air makes from being at rest on the wing to flowing freely around the airplane near the wing. Mathematically, the behavior of such flows at small viscosity is modeled by the NavierStokes equations. In this talk, we discuss some recent results on the boundary layers of the NavierStokes equations under various boundary conditions.
January 6, 2014  Christine Klymko: Central and Communicability Measures in Complex Networks: Analysis and Algorithms
Complex systems are ubiquitous throughout the world, both in nature and within manmade structures. Over the past decade, large amounts of network data have become available and, correspondingly, the analysis of complex networks has become increasingly important. One of the fundamental questions in this analysis is to determine the most important elements in a given network. Measures of node importance are usually referred to as node centrality and measures of how well two nodes are able to communicate with each other are referred to as the communicability between pairs of nodes. Many measures of node centrality and communicability have been proposed over the years. Here, we focus on the analysis and computation of centrality and communicability measures based on matrix functions. First, we examine a node centrality measure based on the notion of total communicability, defined in terms of the row sums of the exponential of the adjacency matrix of the network. We argue that this is a natural metric for ranking nodes in a network, and we point out that it can be computed very rapidly even in the case of large networks. Furthermore, we propose a measure of the total network communicability, based on the total sum of node communicabilities, as a useful measure of the connectivity of the network as a whole. Next, we compare various parameterized centrality rankings based on the matrix exponential and matrix resolvent with degree and eigenvector centrality. The centrality measures we consider are exponential and resolvent subgraph centrality (defined in terms of the diagonal entries of the matrix exponential and matrix resolvent, respectively), total communicability, and Katz centrality (defined in terms of the row sums of the matrix resolvent). We demonstrate an analytical relationship between these rankings and the degree and subgraph centrality rankings which helps to explain explain the observed robustness of these rankings on many real world networks, even though the scores produced by the centrality measures are not stable.
December 19, 2013  Adam Larios: New Techniques for LargeScale Parallel Turbulence Simulations at High Reynolds Numbers
Abstract sent on behalf of the speaker:
Two techniques have recently been developed to handle largescale simulations of turbulent flows. The first is a nonlinear, LEStype viscosity, which is based on the numerical violation of the local energy balance of the NavierStokes equations. This technique enjoys a numerical dissipation which remains vanishingly small in regions where the solution is smooth, only damping the flow in regions of numerical shock, allowing for increased accuracy at reduced computational cost. The second is a directionsplitting technique for projection methods, which unlocks new parallelism previously unexploited in fluid flows, and enables very fast, largescale turbulence simulations.
December 16, 2013  Tuoc Van Phan: Some Aspects in Nonlinear Partial Differential Equations and Nonlinear Dynamics
Abstract is attached and is sent on behalf of the speaker:
This talk contains two parts:
Part I: We discuss the ShigesadaKawasakiTeramoto system of crossdiffusion equations of two competing species in population dynamics. We show that if there are selfdiffusion in one species and no crossdiffusion in the other, then the system has a unique smooth solution for all time in bounded domains of any dimension. We obtain this result by deriving global W ^(1,p)  estimates of CalderónZygmund type for a class of nonlinear reactiondiffusion equations with selfdiffusion. These estimates are achieved by employing CaffarelliPeral perturbation technique together with a new twoparameter scaling argument.
Part II: We study a class of nonlinear Schrödinger equations in one dimensional spatial space with doublewell symmetric potential. We derive and justify a normal form reduction of the nonlinear Schrödinger equation for a general pitchfork bifurcation of the symmetric bound state. We prove persistence of normal form dynamics for both supercritical and subcritical pitchfork bifurcations in the timedependent solutions of the nonlinear Schrödinger equation over long but finite time intervals.
The talk is based on my joint work with Luan Hoang (Texas Tech University), Truyen Nguyen (University of Akron), and Dmitry Pelinovsky (McMaster University).
December 13, 2013  Rich Lehoucq: A Computational Spectral Graph Theory Tutorial
My presentation considers the research question of whether existing algorithms and software for the largescale sparse eigenvalue problem can be applied to problems in spectral graph theory. I first provide an introduction to several problems involving spectral graph theory. I then provide a review of several different algorithms for the largescale eigenvalue problem and briefly introduce the Anasazi package of eigensolvers.
December 10, 2013  Jingwei Hu: Fast algorithms for quantum Boltzmann collision operators
The quantum Boltzmann equation describes the nonequilibrium dynamics of a quantum system consisting of bosons or fermions. The most prominent feature of the equation is a highdimensional integral operator modeling particle collisions, whose nonlinear and nonlocal structure poses a great challenge for numerical simulation. I will introduce two fast algorithms for the quantum Boltzmann collision operator. The first one is a quadrature based solver specifically designed for the collision operator in reduced energy space. Compared to cubic complexity of direct evaluation, our algorithm runs in only linear complexity (optimal up to a logarithmic factor). The second one accelerates the computation of the full phase space collision operator. It is a spectral algorithm based on a special lowrank decomposition of the collision kernel. Numerical examples including an application to semiconductor device modeling are presented to illustrate the efficiency and accuracy of proposed algorithms.
December 6, 2013  Jeongnim Kim: Analysis of QMC Applications on Petascale Computers
Continuum Quantum Monte Carlo (QMC) has proved to be an invaluable tool for predicting the properties of matter from fundamental principles. The multiple forms of parallelism afforded by QMC algorithms and high computetocommunication ratio make them ideal candidates for acceleration in the multi/manycore paradigm, as demonstrated by the performance of QMCPACK on various highperformance computing (HPC) platforms including Titan (Cray XK7) and Mira (IBM BlueGene Q).
The changes expected on future architectures  orders of magnitude higher parallelism, hierarchical memory and communication, and heterogeneous nodes  pose great challenges to application developers but also present opportunities to transform them to tackle new classes of problems. This talk presents core QMC algorithms and their implementations in QMCPACK on the HPC systems of today. The speaker will discuss the performance of typical QMC workloads to elucidate the critical issues to be resolved for QMC to fully exploit increasing computing powers of forthcoming HPC systems.
December 3, 2013  Terry Haut: Advances on an asymptotic parallelintime method for highly oscillatory PDEs
In this talk, I will first review a recent timestepping algorithm for nonlinear PDEs that exhibit fast (highly oscillatory) time scales. PDEs of this form arise in many applications of interest, and in particular describe the dynamics of the ocean and atmosphere. The scheme combines asymptotic techniques (which are inexpensive but can have insufficient accuracy) with parallelintime methods (which, alone, can yield minimal speedup for equations that exhibit rapid temporal oscillations). Examples are presented on the (1D) rotating shallow water equations in a periodic domain, which demonstrate significant parallel speedup is achievable.
In order to implement this timestepping method for general spatial domains (in 2D and 3D), a key component involves applying the exponential of skewHermitian operators. To this end, I will next present a new algorithm for doing so. This method can also be used for solving wave propagation problems, which is of independent interest. This scheme has several advantages over standard methods, including the absence of any stability constraints in relation to the spatial discretization, and the ability to parallelize the computation in the time variable over as many characteristic wavelengths as resources permit (in addition to any spatial parallelization). I will also present examples on the linear 2D shallow water equations, as well the 2D (variable coefficient) wave equation. In these examples, this method (in serial) is 12 orders of magnitude faster than both RK4 and the use of Chebyshev polynomials.
December 3, 2013  Galen Shipman: The Compute and Data Environment for Science (CADES)
In this talk I will discuss ORNL's Compute and Data Environment for Science. The Compute and Data Environment for Science (CADES) provides R&D with a flexible and elastic compute and data infrastructure. The initial deployment consists of over 5 petabytes of highperformance storage, nearly half a petabyte of scalable NFS storage, and over 1000 compute cores integrated into a high performance ethernet and InfiniBand network. This infrastructure, based on OpenStack, provides a customizable compute and data environment for a variety of use cases including largescale omics databases, data integration and analysis tools, data portals, and modeling/simulation frameworks. These services can be composed to provide endtoend solutions for specific science domains.
Galen Shipman is the Data Systems Architect for the Computing and Computational Sciences Directorate and Director of the Compute and Data Environment for Science at Oak Ridge National Laboratory (ORNL). He is responsible for defining and maintaining an overarching strategy and infrastructure for data storage, data management, and data analysis spanning from research and development to integration, deployment and operations for highperformance and dataintensive computing initiatives at ORNL. His current work includes addressing many of the data challenges of major facilities such as those of the Spallation Neutron Source (Basic Energy Sciences) and major data centers focusing on Climate Science (Biological and Environmental Research).
December 2, 2013  Wei Ding: Klonos: A Similarity AnalysisBased Tool for Software Porting in HighPerformance Computing
Porting applications to a new system is a nontrivial job in the HPC field. It is a very timeconsuming, laborintensive process, and the quality of the results will depend critically on the experience of the experts involved. In order to ease the porting process, a methodology is proposed to address an important aspect of software porting that receives little attention, namely, planning support. When a scientific application consisting of many subroutines is to be ported, the selection of key subroutines greatly impacts the productivity and overall porting strategy, because these subroutines may represent a significant feature of the code in terms of functionality, code structure, or performance. They may also serve as indicators of the difficulty and amount of effort involved in porting a code to a new platform. The proposed methodology is based on the idea that a set of similar subroutines can be ported with similar strategies and result in a similarquality porting. By vie wing subroutines as data and operator sequences, analogous to DNA sequences, various bioinformatics techniques may be used to conduct the similarity analysis of subroutines while avoiding NPcomplete complexities of other approaches. Other code metrics and costmodel metrics have been adapted for similarity analysis to capture internal code characteristics. Based on those similarity analyses, "Klonos," a tool for software porting, has been created. Experiment shows that Klonos is very effective for providing a systematic porting plan to guide users during their porting process of reusing similar porting strategies for similar code regions.
November 20, 2013  Chao Yang: Numerical Algorithms for Solving Nonlinear Eigenvalue Problems in Electronic Structure Calculation
The KohnSham density functional theory (KSDFT) is the most widely used theory for studying electronic properties of molecules and solids. The main computational problem in KSDFT is a nonlinear eigenvalue problem in which the matrix Hamiltonian is a function of a number of eigenvectors associated with smallest eigenvalues. The problem can also be formulated as a constrained energy minimization problem or a nonlinear equation in which the unknown ground state electron density satisfies a fixed point map. Significant progress has been made in the last few years on understanding the mathematical properties of this class of problems. Efficient and reliable numerical algorithms have been developed to accelerate the convergence of nonlinear solvers. New methods have also been developed to reduce the computational cost in each step of the iterative solver. We will review some of these developments and discuss additional challenges in largescale electronic structure calculations.
November 15, 2013  Christian Straube: Simulation of HPDC Infrastructure Attributes
High Performance Distributed Computing (HPDC) infrastructures use several data centers, High Performance Computing (HPC) and distributed systems, each built from manifold (often heterogeneous) compute, storage, interconnect, and other specialized sub components to provide their capabilities, i.e. welldefined functionality that is exposed to a user or application. Capabilities' quality can be described by attributes, e.g., performance, energy efficiency, or reliability. Hardwarerelated modifications, such as clock rate adaptation or interconnect throughput improvement, often induce two groups of effects onto these attributes: the (by definition) positive intended effects and the mostly negative but unavoidable side effects. For instance, increasing a typical HPDC infrastructure's redundancy to address shorttime breakdown and to improve reliability (positive intended effect), simultaneously increases energy consumption and degrades performance due to redundancy overhead (neg
ative side effects).
In this talk, I present Predictive Modification Effect Analysis (PMEA) that aims at avoiding harmful execution and costly but spare modification exploration by investigating in advance, whether the (negative) side effects on attributes will outweigh the (positive) intended effects. The talk covers the fundamental concepts and basic ideas of PMEA and it presents it's underlying model. The model is straightforward and fosters fast development, even for complex HPDC infrastructures, it handles individual and open sets of attributes and their calculations, and it addresses effect cascading through the entire HPC infrastructure. Additionally, I will present a prototype of a simulation tool and describe some selected features in detail.
Bio:
Christian Straube is a Computer Science Ph.D. student at the LudwigMaximiliansUniversity (LMU) in Munich, Germany since January 2012. His research interests include HPDC infrastructure and data center analysis, in particular planning, modification justification, as well as effect outweighing and cascading. During his time as Ph.D. student, he worked several months at the Leibniz Supercomputing Center, which operates the SuperMUC, a three Petaflop/s system that applies warmwater cooling. Prior to joining LMU as a Ph.D. student, Christian worked for several years in industry and academia as software engineer and project manager. He ran his own software engineering company for 10 years, and was (co) founder of several IT related startups. He received a best paper award for a conference contribution to INFOCOMP 2012 and was subsequently invited as technical program member of INFOCOMP 2013. Christian holds a Diploma with Distinction in Computer Science from LudwigMaximiliansUniversity in Munich with a minor in Medicine.
November 12, 2013  Surya R. Kalidindi: Data Science and Cyberinfrastructure Enabled Development of Advanced Materials
Materials with enhanced performance characteristics have served as critical enablers for the successful development of advanced technologies throughout human history, and have contributed immensely to the prosperity and wellbeing of various nations. Although the core connections between the material's internal structure (i.e. microstructure), its evolution through various manufacturing processes, and its macroscale properties (or performance characteristics) in service are widely acknowledged to exist, establishing this fundamental knowledge base has proven effortintensive, slow, and very expensive for a number of candidate material systems being explored for advanced technology applications. It is anticipated that the multifunctional performance characteristics of a material are likely to be controlled by a relatively small number of salient features in its microstructure. However, costeffective validated protocols do not yet exist for fast identification of these salient features and establishment of the desired core knowledge needed for the accelerated design, manufacture and deployment of new materials in advanced technologies. The main impediment arises from lack of a broadly accepted framework for a rigorous quantification of the material's microstructure, and objective (automated) identification of the salient features in the microstructure that control the properties of interest.
Microstructure Informatics focuses on the development of data science algorithms and computationally efficient protocols capable of mining the essential linkages in large microstructure datasets (both experimental and modeling), and building robust knowledge systems that can be readily accessed, searched, and shared by the broader community. Given the nature of the challenges faced in the design and manufacture of new advanced, this new emerging interdisciplinary field is ideally positioned to produce a major transformation in the current practices used by materials scientists and engineers. The novel data science tools produced by this emerging field promise to significantly accelerate the design and development of new advanced materials through their increased efficacy in gleaning and blending the disparate knowledge and insights hidden in "big data" gathered from multiple sources (including both experiments and simulations). This presentation outlines specific strategies for data science enabled development of advanced materials, and illustrates key components of the proposed overall strategy with examples.
November 11, 2013  Hermann Härtig: A fast and fault tolerant microkernelbased system for exascale computing (FFMK)
FFMK is a recently started project funded by DFG's ExascaleSoftware program. It addresses three key scalability obstacles expected in future exascale systems: the vulnerability to system failures due to transient or permanent failures, the performance losses due to imbalances and the noise due to unpredictable interactions between HPC applications and the operating system. To this end, we adapt and integrate wellproven technologies including:
 Microkernelbased operating systems (L4) to eliminate operating system noise impacts of featureheavy allinone operating systems and to make kernel influences more deterministic and predictable,
 Erasurecode protected onnode checkpointing to provide a fast checkpoint and restart mechanism capable of keeping up with worsening meantime between failures (MTBF), and
 Mathematically sound management system and load balancing algorithms (Mosix) to adjust the system to the highly dynamic and wide variety of requirements for today's and future HPC applications.
FFMK will combine Linux running in a lightweight virtual machine with a specialpurpose component for MPI, both running side by side on L4. The objective is to build a fluid selforganizing platform for applications that require scaling up to exascale performance. The talk will explain assumptions and overall architecture of FFMK and continue with presenting a number of design decisions the team is currently facing. FFMK is a cooperation between Hebrew University's MosiX team, the HPC centers of Berlin and Dresden (ZIB, ZIH) and TU Dresden's operating systems group.
Bio:
After having received his PhD from Karlsruhe University on an SMPrelated topic, Hermann Härtig led a team at German National Research Center(GMD) to build BirliX, a Unix lookalike designed to address high security requirements. He then moved to TU Dresden to lead the operating systems chair. His team was among the pioneers in building micro kernels of the L4 family (Fiasco, Nova) and systems based on L4 (LeRE, DROPS, NIZZA). L4RE and Fiasco form the OS basis of the SIMKO 3 smart phone. Hermann Härtig now is PI for FFMK.
October 17, 2013  Marta D'Elia: Fractional differential operators on bounded domains as special cases of nonlocal diffusion operators
We analyze a nonlocal diffusion operator having as special cases the fractional Laplacian and fractional differential operators that arise in several applications, e.g. jump processes. In our analysis, a nonlocal vector calculus is exploited to define a weak formulation of the nonlocal problem. We demonstrate that the solution of the nonlocal equation converges to the solution of the fractional Laplacian equation on bounded domains as the nonlocal interactions become infinite. We also introduce Galerkin finite element discretizations of the nonlocal weak formulation and we derive a priori error estimates. Through several numerical examples we illustrate the theoretical results and we show that by solving the nonlocal problem it is possible to obtain accurate approximations of the solutions of fractional differential equations circumventing the problem of treating infinitevolume constraints.
October 15, 2013  Tommy Janjusic: Framework for Evaluating Dynamic Memory Allocators including a new Equivalence Class based CacheConscious Dynamic Memory Allocator
Software applications' performance is hindered by a variety of factors, but most notably by the wellknown CPUMemory speed gap (often known as the memory wall). This results in the CPU sitting idle waiting for data to be brought from memory to processor caches. The addressing used by caches causes nonuniform accesses to various cache sets. The nonuniformity is due to several reasons; including how different objects are accessed by the code and how the data objects are located in memory. Memory allocators determine where dynamically created objects are placed, thus defining addresses and their mapping to cache locations. It is important to evaluate how different allocators behave with respect to the localities of the created objects. Most allocators use a single attribute, the size, of an object in making allocation decisions. Additional attributes such as the placement with respect to other objects, or specific cache area may lead to better use of cache memories. This talk discusses a framework that allows for the development and evaluation of new memory allocation techniques. At the root of the framework is a memory tracing tool called Gleipnir, which provides very detailed information about every memory access, and relates it back to source level objects. Using the traces from Gleipnir, we extended a commonly used cache simulator for generating detailed cache statistics: per function, per data object, per cache line, and identify specific data objects that are conflicting with each other. The utility of the framework is demonstrated with a new memory allocator known as an equivalence class allocator. The new allocator allows users to specify cache sets, in addition to object size, where the objects should be placed. We compare this new allocator with two wellknown allocators, viz., Doug\_Lea and Pool allocators.
October 8, 2013  Sophie Blondel: NAT++: An analysis software for the NEMO experiment
The NEMO 3 detector aims to prove that the neutrino is a Majorana particle (i.e. identical to the antineutrino). It is mainly composed of a calorimeter and a wire chamber, the former measuring the time and energy of a particle, and the latter reconstructing its track. NEMO 3 has taken data for 5 effective years with an event trigger rate of ~5 Hz, resulting in a total of 10e8 events to analyze. A C++based software, called NAT++, was created to calibrate and analyze these events. The analysis is mainly based on a time of flight calculation which will be the focus of this presentation. Supplementing this classic analysis, a new tool named gammatracking has been developed in order to improve the reconstruction of the gamma energy deposits in the detector. The addition of this tool in the analysis pipeline leads to an increase of 30% of statistics in certain desired channels.
September 30, 2013  Eric Barton: Fast Forward Storage and Input/Output (I/O)
Conflicting pressures drive the requirements for I/O and Storage at Exascale. On the one hand, an explosion is anticipated, not only in the size of scientific data models but also in their complexity and in the volume of their attendant metadata. These models require workflows that integrate analysis and visualization and new objectoriented I/O Application Programming Interfaces (APIs) to make application development tractable and allow compute to be moved to the data or data to the compute as appropriate. On the other hand, economic realities driving the architecture and reliability of the underlying hardware will push the limits on horizontal scale, introduce unavoidable jitter and make failure the norm. The I/O system will have to handle these as transparently as possible while providing efficient, sustained and predictable performance. This talk will describe the research underway in the Department of Energy (DOE) Fast Forward Project to prototype a complete Exascale I/O stack including at the top level, an objectoriented I/O API based on HDF5, in the middle, a Burst Buffer and data layout optimizer based on PLFS (A Checkpoint Filesystem for Parallel Applications) and at the bottom, DAOs (Data Access Objects)  transactional object storage based on Lustre.
September 25, 2013  James Beyer: OPENMP vs OPENACC
A brief introduction to two accelerator programming directive sets with a common heritage: OpenACC 2.0 and OpenMP 4.0. After introducing the two directive sets, a side by side comparison of available features along with code examples will be presented to help developers understand their options as they begin programming for both Nvidia and Intel accelerated machines.
September 25, 2013  Michael Wolfe: OPENACC 2.X AND BEYOND
The OpenACC API is designed to support highlevel, performance portable, programming across a range of host+accelerator target systems. This presentation will start with a short discussion of that range, which provides a context for the features and limitations of the specification. Some important additions that were included in OpenACC 2.0 will be highlighted. New features currently under discussion for future versions of the OpenACC API and a summary of the expected timeline will be presented.
September 23, 2013  Jun Jia: Accelerating time integration using spectral deferred correction
In this talk, we illustrate how to use the spectral deferred correction (SDC) to improve the time integration for scientific simulations. The SDC method combines a Picard integral formulation of the error equation, spectral integration and a user chosen loworder time marching method to form stable methods with arbitrarily high formal order of accuracy in time. The method could be either explicit or implicit, and it also provides the ability to adopt operator splitting while maintaining high formal order. At the end of the talk, we will show some applications using this technique.
September 19, 2013  Kenny Gross: Energy Aware Data Center (EADC) Innovations: Save Energy, Boost Performance
The global electricity consumption for enterprise and highperformance computing data centers continues to grow much faster than Moore's Law as data centers push into emerging markets, and as developed countries see explosive growth in computing demand as well as supraexponential growth in demand for exabyte (and now zettabyte) storage systems. The USDOE reported that data centers now consume 38 gigawatts of electricity worldwide, a number that is growing exponentially even during times of global economic slowdowns. Oracle has developed a suite of novel algorithmic innovations that can be applied nonintrusively to any IT servers and substantially reduces the energy usage and thermal dissipation for the IT assets (saving additional energy for the data center HVAC systems), while significantly boosting performance (and hence ReturnOnAssets) for the IT assets, thereby avoiding additional server purchases (that would consume more energy). The key enabler for this suite of algorithmic innovations is Oracle's Intelligent Power Monitoring (IPM) telemetry harness (implemented in software...no hardware mods anywhere in the data center). IPM, when coupled with advanced pattern recognition, identifies and quantifies three significant nonlinear (heretofore 'invisible') energywastage mechanisms that are present in all enterprise and HPC computing assets today, including in lowPUE highefficiency data centers: 1) leakage power in the CPUs (grows exponentially with CPU temperature), 2) aggregate fanmotor power inside the servers (grows with the cubic power of fan RPMs), and 3) substantial degradation of server energy efficiency by lowlevel ambient vibrations in the data center racks. This presentation shows how continuous system internal telemetry coupled with advanced pattern recognition technology that was developed for nuclear reactor applications by the presenter and his team back at Argonne National Lab in the 1990s are significantly cutting energy utilization while boosting performance for enterprise and HPC computing assets.
Speaker Bio Info:

Kenny Gross is a Distinguished Engineer for Oracle and team leader for the System Dynamics Characterization and Control team in Oracle's Physical Sciences Research Center in San Diego. Kenny specializes in advanced pattern recognition, continuous system telemetry, and dynamic system characterization for improving the reliability, availability, and energy efficiency of enterprise computing systems and for the datacenters in which the systems are deployed. Kenny has 220 US patents issued and others pending, 180 scientific publications, and was awarded a 1998 R&D 100 Award for one of the top 100 technological innovations of that year, for an advanced statistical pattern recognition technique that was originally developed for nuclear plant applications and is now being used for a variety of applications to improve the qualityofservice, availability, and optimal energy efficiency for enterprise and HPC computer servers. Kenny earned his Ph.D. in nuclear engineering from the U. of Cincinnati in 1977.
September 17, 2013  Damien LebrunGrandie: Simulation of thermomechanical contact between fuel pellets and cladding in UO2 nuclear fuel rods
As fission process heats up the fuel rods, UO2 pellets stacked on top of each other swell both radially and axially, while the surrounding Zircaloy cladding creeps down, so that cladding and pellet eventually come into contact. This exacerbate chemical degradation of the protective cladding and stresses may enable rapid propagation of cracks and thus threaten integrity of the clad. Along these lines, pelletcladding interaction establish itself as a major concern in fuel rod design and reactor core operation in light water reactors. Accurately modeling fuel behavior is challenging because the mechanical contact problem strongly depends on temperature distribution, and the coupled pelletcladding heat transfer problem, in turn, is affected by changes in geometry induced by bodies deformations and stresses generated at contact interface.
Our work focuses on active set strategies to determine the actual contact area in highfidelity coupled physics fuel performance codes. The approach consists of two steps: In the first one, we determine the boundary region on conventional finite element meshes where the contact conditions shall be enforced to prevent objects from occupying the same space. For this purpose, we developed and implemented an efficient parallel search algorithm for detecting mesh interpenetration and vertex/mesh overlap. The second step deals with solving the mechanical equilibrium factoring the contact conditions computed in the first step. To do so, we developed a modified version of the multipoint constraint (MPC) strategy. While the original algorithm was restricted to the Jacobi preconditioned conjugate gradient method, our MPC algorithm works with any other Krylov solvers (and thus liberate us from the symmetry requirements). Furthermore it does not place any restriction on the preconditioner used.
The multibody thermomechanical contact problem is tackled using modern numerics, with higherorder finite elements and a Newtonbased monolithic strategy to handle both nonlinearities (coming from the nonlinearity of the contact condition but as well as from the temperaturedependence of the fuel thermal conductivity for instance) and coupling between the various physics components (gap conductance sensitive to the clad pellet distance, thermal expansion coefficient or Youngs modulus affected by temperature changes, etc.).
We will provide different numerical examples for one and multiple bodies contact problems to demonstrate how the method performs.
September 5, 2013  Jared Saia: How to Build a Reliable System Out of Unreliable Components
The first part of this talk will survey several decades of work on designing distributed algorithms that boost reliability. These algorithms boost reliability in the sense that they enable the creation of a reliable system from unreliable components. We will discuss practical successes of these algorithms, along with drawbacks. A key drawback is scalability: significant redundancy of resources is required in order to tolerate even one node fault. The second part of the talk will introduce a new class of distributed algorithms for boosting reliability. These algorithms are selfhealing in the sense that they dynamically adapt to failures, requiring additional resources only when faults occur.
We will discuss two such selfhealing algorithms. The first enables selfhealing in an overlay network, even when an omniscient adversary repeatedly removes carefully chosen nodes. Specifically, the algorithm ensures that the shortest path between any pair of nodes never increases by more than a logarithmic factor, and that the degree of any node never increases by more than a factor of 3. The second algorithm enables selfhealing with Byzantine faults, where an adversary can control t < n/8 of the n total nodes in the network. This algorithm enables pointtopoint communication with an expected number of message corruptions that is O(t(log* n)^2). Empirical results show that this algorithm reduces bandwidth and computation costs by up to a factor of 70 when compared to previous work.
August 21, 2013  Hank Childs: Hybrid Parallelism for Visualization and Analysis
Many of today's parallel visualization and analysis programs are designed for distributedmemory parallelism, but not for the sharedmemory parallelism available on GPUs or multicore CPUs. However, architectural trends on supercomputers increasingly contain more and more cores per node, whether through the presence of GPUs or through more cores per CPU node. To make the best use of such hardware, we must evaluate the benefits of hybrid parallelism  parallelism that blends distributed and sharedmemory approaches  for visualization and analysis's dataintensive workloads. With this talk, Hank explores the fundamental challenges and opportunities for hybrid parallelism with visualization and analysis, and discusses recent results that measure its benefit.
Speaker Bio:
Hank Childs is an assistant professor at the University of Oregon and a computer systems engineer at Lawrence Berkeley National Laboratory. His research focuses on scientific visualization, highperformance computing, and the intersection of the two. He received the Department of Energy Career award in 2012 to research explorative visualization use cases on exascale machines. Additionally, Hank is one of the founding members of the team that developed the VisIt visualization and analysis software. He received his Ph.D. from UC Davis in 2006.
August 13, 2013  Rodney O. Fox: QuadratureBased Moment Methods for KineticsBased Flow Models
Kinetic theory is a useful theoretical framework for developing multiphase flow models that account for complex physics (e.g., particle trajectory crossings, particle size distributions, etc.) (1). For most applications, direct solution of the kinetic equation is intractable due to the highdimensionality of the phase space. Thus a key challenge is to reduce the dimensionality of the problem without losing the underlying physics. At the same time, the reduced description must be numerically tractable and possess the favorable attributes of the original kinetic equation (e.g. hyperbolic, conservation of mass/momentum, etc.)
Starting from the seminal work of McGraw (2) on the quadrature method of moments (QMOM), we have developed a general closure approximation referred to as quadraturebased moment methods (3; 4; 5). The basic idea behind these methods is to use the local (in space and time) values of the moments to reconstruct a welldefined local distribution function (i.e. nonnegative, compact support, etc.). The reconstructed distribution function is then used to close the moment transport equations (e.g. spatial fluxes, nonlinear source terms, etc.).
In this seminar, I will present the underlying theoretical and numerical issues associated with quadraturebased reconstructions. The transport of moments in real space, and its numerical representation in terms of fluxes, plays a critical role in determining whether a moment set is realizable. Using selected examples, I will introduce recent work on realizable highorder flux reconstructions developed specifically for finitevolume schemes (6).
References
[1] MARCHISIO, D. L. & FOX, R. O. 2013 Computational Models for Polydisperse Particulate and Multiphase Systems, Cambridge University Press.
[2] MCGRAW, R. 1997 Description of aerosol dynamics by the quadrature method of moments. Aerosol Science and Technology 27, 255–265.
[3] DESJARDINS, O., FOX, R. O. & VILLEDIEU, P. 2008 A quadraturebased moment method for dilute fluidparticle flows. Journal of Computational Physics 227, 2514–2539.
[4] YUAN, C. & FOX, R. O. 2011 Conditional quadrature method of moments for kinetic equations. Journal of Computational Physics 230, 8216–8246.
[5] YUAN, C., LAURENT, F. & FOX, R. O. 2012 An extended quadrature method of moments for population balance equations. Journal of Aerosol Science 51, 1–23.
[6] VIKAS, V., WANG, Z. J., PASSALACQUA, A. & FOX, R. O. 2011 Realizable highorder finitevolume schemes for quadraturebased moment methods. Journal of Computational Physics 230, 5328–5352.
August 12, 2013  Lucy Nowell: ASCR: Funding/ Data/ Computer Science
Dr. Lucy Nowell is a Computer Scientist and Program Manager for the Advanced Scientific Computing Research (ASCR) program office in the Department of Energy's (DOE) Office of Science. While her primary focus is on scientific data management, analysis and visualization, her portfolio spans the spectrum of ASCR computer science interests, including supercomputer architecture, programming models, operating and runtime systems, and file systems and input/output research. Before moving to DOE in 2009, Dr. Nowell was a Chief Scientist in the Information Analytics Group at Pacific Northwest National Laboratory (PNNL). On detail from PNNL, she held a twoyear assignment as a Program Director for the National Science Foundation's Office of Cyberinfrastructure, where her program responsibilities included Sustainable Digital Data Preservation and Access Network Partners (DataNet), Communitybased Data Interoperability Networks (INTEROP), Software Development for Cyberinfrastructure (SDCI) and Strategic Technologies for Cyberinfrastructure (STCI). At PNNL, her research centered on applying her knowledge of visual design, perceptual psychology, humancomputer interaction, and information storage and retrieval to problems of understanding and navigating in very large information spaces, including digital libraries. She holds several patents in information visualization technologies.
Dr. Nowell joined PNNL in August 1998 after a career as a professor at Lynchburg College in Virginia, where she taught a wide variety of courses in Computer Science and Theatre. She also headed the Theatre program and later chaired the Computer Science Department. While pursuing her Master of Science and Doctor of Philosophy degrees in Computer Science at Virginia, she worked as a Research Scientist in the Digital Libraries Research Laboratory and also interned with the Information Access team at IBM's T. J. Watson Research Laboratories in Hawthorne, NY. She also has a Master of Fine Arts degree in Drama from the University of New Orleans and the Master of Arts and Bachelor of Arts degrees in Theatre from the University of Alabama
August 8, 2013  Carlos Maltzahn: Programmable Storage Systems
With the advent of open source parallel file systems a new usage pattern emerges: users isolate subsystems of parallel file systems and put them in contexts not foreseen by the original designers, e.g., an objectbased storage back end gets a new RESTful front end to become Amazon Web Service's S3 compliant key value store, or a data placement function becomes a placement function for customer accounts. This trend shows a desire for the ability to use existing file system services and compose them to implement new services. We call this ability "programmable storage systems".
In this talk I will argue that by designing programmability into storage systems has the following benefits: (1) we are achieving greater separation of storage performance engineering from storage reliability engineering, making it possible to optimize storage systems in a wide variety of ways without risking years of investments into code hardening; (2) we are creating an environment that encourages people to create a new stack of storage systems abstractions, both domainspecific and across domains, including sophisticated optimizers that rely on machine learning techniques; (3) we inform commercial parallel file system vendors on the design of lowlevel APIs for their products so that they match the versatility of open source storage systems without having to release their entire code into open source; and (4) use this historical opportunity to leverage the tension between the versatility of open source storage systems and the reliability of proprietary systems to lead the community of storage system designers.
I will illustrate programmable storage with an overview of programming abstractions that we have found useful so far, and if time permits, talk about "scriptable storage systems" and the interesting new possibilities of truly datacentered software engineering it enables.
Bio: Carlos Maltzahn is an Associate Adjunct Professor at the Computer Science Department of the Jack Baskin School of Engineering, Director of the UCSC Systems Research Lab and Director of the UCSC/Los Alamos Institute for Scalable Scientific Data Management at the University of California at Santa Cruz. Carlos Maltzahn's current research interests include scalable file system data and metadata management, storage QoS, data management games, network intermediaries, information retrieval, and cooperation dynamics.
Carlos Maltzahn joined UC Santa Cruz in December 2004 after five years at Network Appliance. He received his Ph.D. in Computer Science from the University of Colorado at Boulder in 1999, his M.S. in Computer Science in 1997, and his Univ. Diplom Informatik from the University of Passau, Germany in 1991.
August 7, 2013  Tiffany M. Mintz: Toward Abstracting the Communication Intent in Applications to Improve Portability and Productivity
Programming with communication libraries such as the Message Passing Interface (MPI) obscures the highlevel intent of the communication in an application and makes static communication analysis difficult to do. Compilers are unaware of communication libraries' specifics, leading to the exclusion of communication patterns from any automated analysis and optimizations. To overcome this, communication patterns can be expressed at higherlevels of abstraction and incrementally added to existing MPI applications. In this paper, we propose the use of directives to clearly express the communication intent of an application in a way that is not specific to a given communication library. Our communication directives allow programmers to express communication among processes in a portable way, giving hints to the compiler on regions of computations that can be overlapped with communication and relaxing communication constraints on the ordering, completion and synchronization of the communication imposed by specific libraries such as MPI. The directives can then be translated by the compiler into message passing calls that efficiently implement the intended pattern and be targeted to multiple communication libraries. Thus far, we have used the directives to express pointtopoint communication patterns in C, C++ and Fortran applications, and have translated them to MPI and SHMEM.
August 2, 2013  Alberto Salvadori: Multiscale and multiphysics modeling of Liion batteries: a computational homogenization approach
There is being great interest in developing next generation of lithium ion battery for higher capacity and longer life of cycling, in order to develop signiﬁcantly more demanding energy storage requirements for humanity existing and future inventories of powergeneration and energymanagement systems. Industry and academic are looking for alternative materials and Si is one of the most promising candidates for the active material, because it has the highest theoretical speciﬁc energy capacity. It emerged that very large mechanical stresses associated with huge volume changes during Li intercalation/deintercalation are responsible for poor cyclic behaviors and quick fading of electrical performance. The present contribution aims at providing scientific contributions in this vibrant context.
The computational homogenization scheme is here tailored to model the coupling between electrochemistry and mechanical phenomena that coexist during batteries charging and discharging cycles. At the macroscale, di.ﬂ'usionadvection equations model the electrochemistry of the whole cell, whereas the microscale models the multicomponent porous electrode, diﬁfusion and intercalation of Lithium in the active particles, the swelling and fracturing of the latter. The scale transitions are formulated by tailoring the well established ﬁrstorder computational homogenization scheme for mechanical and thermal problems.
August 2, 2013  Michela Taufer: The effectiveness of applicationaware selfmanagement for scientific discovery in volunteer computing systems
There is being great interest in developing next generation of lithium ion battery for higher capacity and longer life of cycling, in order to develop signiﬁcantly more demanding energy storage requirements for humanity existing and future inventories of powergeneration and energymanagement systems. Industry and academic are looking for alternative materials and Si is one of the most promising candidates for the active material, because it has the highest theoretical speciﬁc energy capacity. It emerged that very large mechanical stresses associated with huge volume changes during Li intercalation/deintercalation are responsible for poor cyclic behaviors and quick fading of electrical performance. The present contribution aims at providing scientific contributions in this vibrant context.
July 24, 2013  Catalin Trenchea: Improving timestepping numerics for weakly dissipative systems
In this talk I will address the stability and accuracy of CNLF timestepping scheme, and propose a modification of RobertAsselin timefilters for numerical models of weakly diffusive evolution systems. This is motivated by the vast number of applications, e.g., the meteorological equations, and coupled systems with dominating skew symmetric coupling (groundwater surfacewater).
In contemporary numerical simulations of the atmosphere, evidence suggests that timestepping errors may be a significant component of total model error, on both weather and climate timescales. After a brief review, I will suggest a simple but effective method for substantially improving the timestepping numerics at no extra computational expense.
The most common timestepping method is the leapfrog scheme combined with the RobertAsselin (RA) filter. This method is used in many atmospheric models: ECHAM, MAECHAM, MM5, CAM, MESONH, HIRLAM, KMCM, LIMA, SPEEDY, IGCM, PUMA, COSMO, FSUGSM, FSUNRSM, NCEPGFS, NCEPRSM, NSEAM, NOGAPS, RAMS, and CCSR/NIESAGCM. Although the RA filter controls the timesplitting instability in these models (successfully suppresses the spurious computational mode associated with the leapfrog time stepping scheme), it also weakly suppresses the physical mode, introduces nonphysical damping, and reduces the accuracy.
This presentation proposes a simple modification to the RA filter (mRA) [Y. Li, CT 2013].
The modification is analyzed and compared with the RAW filter (Williams 2009, 2011).
The mRA increases the numerical accuracy to O(Δt^4) amplitude error and at least O(Δt^{2}) phasespeed error for the physical mode. The mRA filter requires the same storage factors as RAW, and one more than the RA filter does. When used in conjunction with the leapfrog scheme, the RAW filter eliminates the nonphysical damping and increases the amplitude accuracy by two orders, yielding thirdorder accuracy, the phase accuracy remaining secondorder. The mRA and RAW filters can easily be incorporated into existing models, typically via the insertion of just a single line of code. Better simulations are obtained at no extra computational expense.
June 28, 2013  Yuri Melnikov: A surprising connection between Green's functions and the infinite product representation of elementary functions
Some standard as well as innovative approaches will be reviewed for the construction of Green's functions for the elliptic PDEs. Based on that, a surprising technique is proposed for obtaining infinite product representations of some trigonometric, hyperbolic, and special functions. The technique uses comparison of different alternative expressions of Green's functions constructed by different methods. This allows us not only obtain the classical Euler's formulas but also come up with a number of new representations.
June 27, 2013  Kimmy Mu: Performance, accuracy and power tradeoff for scientific processes using workflow in high performance computing
Power is getting more important in high performance computing than ever before as we are on the way to exascale computing. The transition from old style which considers performance and accuracy to the new style which will take care of performance, accuracy and power is necessary. In high performance computing a workflow is composed of a large number of tasks, such as simulation, analysis and visualization. However, there is no such guidance for user getting to know which kind of task allocation and task placement to nodes and clusters are good for performance or power with accuracy requirement. In this presentation, I will talk about power optimization for reconfigurable embedded systems which dynamically choose kernels to run on hardware coprocessors to response to dynamic application behavior at runtime. With a lot of commonalities as in HPC, we are going to explore the method in high performance computing for a dynamic workflow of task placement, etc., in terms of performance, power and accuracy constraints.
June 26, 2013  Matthew Causley: A fast implicit Maxwell field solver for plasma simulations
We present a conservative spectral scheme for Boltzmann collision operators. This formulation is derived from the weak form of the Boltzmann equation, which can represent the collisional term as a weighted convolution in Fourier space. The weights contain all of the information of the collision mechanics and can be precomputed. I will present some results for isotropic (in angle) interations, such as hard spheres and Maxwell molecules. We have recently extended the method to take into account anisotropic scattering mechanisms arising from potential interactions between particles, and we use this method to compute the Boltzmann equation with screened Coulomb potentials. In particular, we study the rate of convergence of the Fourier transform for the Boltzmann collision operator in the grazing collisions limit to the Fourier transform for the limiting Landau collision operator. We show that the decay rate to equilibrium depends on the parameters associated with the collision cross section, and specifically study the differences between the classical Rutherford scattering angular cross section, which has logarithmic error, and an artificial one with a linear error. I will also present recent work extending this method for multispecies gases and gas with internal degrees of freedom, which introduces new challenges for conservation and introduces inelastic collisions to the system.
June 25, 2013  Jeff Haack: Conservative Spectral Method for Solving the Boltzmann Equation
We present a conservative spectral scheme for Boltzmann collision operators. This formulation is derived from the weak form of the Boltzmann equation, which can represent the collisional term as a weighted convolution in Fourier space. The weights contain all of the information of the collision mechanics and can be precomputed. I will present some results for isotropic (in angle) interations, such as hard spheres and Maxwell molecules. We have recently extended the method to take into account anisotropic scattering mechanisms arising from potential interactions between particles, and we use this method to compute the Boltzmann equation with screened Coulomb potentials. In particular, we study the rate of convergence of the Fourier transform for the Boltzmann collision operator in the grazing collisions limit to the Fourier transform for the limiting Landau collision operator. We show that the decay rate to equilibrium depends on the parameters associated with the collision cross section, and specifically study the differences between the classical Rutherford scattering angular cross section, which has logarithmic error, and an artificial one with a linear error. I will also present recent work extending this method for multispecies gases and gas with internal degrees of freedom, which introduces new challenges for conservation and introduces inelastic collisions to the system.
June 17, 2013  Megan Cason: Analytic Utility Of Novel Threading Models In Distributed Graph Algorithms
Current analytic methods for judging distributed algorithms rely on communication abstractions that characterize performance assuming purely passive data movement and access. This assumption complicates the analysis of certain algorithms, such as graph analytics, which have behavior that is very dependent on data movement and modifying shared variables. This presentation will discuss an alternative model for analyzing theoretic scalability of distributed algorithms written with the possibility of active data movement and access. The mobile subjective model presented here confines all communication to 1) shared memory access and 2) executing thread state which can be relocated between processes, i.e., thread migration. Doing so enables a new type of scalability analysis, which calculates the number of thread relocations required, and whether that communication is balanced across all processes in the system. This analysis also includes a model for contended shared data accesses, which is used to identify serialization points in an algorithm. This presentation will show the analysis for a common distributed graph algorithm, and illustrate how this model could be applied to a real world distributed runtime software stack.
June 14, 2013  Jeff Carver: Applying Software Engineering Principles to Computational Science
The increase in the importance of Computational Science software motivates the need to identify and understand which software engineering (SE) practices are appropriate. Because of the uniqueness of the computational science domain, exiting SE tools and techniques developed for the business/IT community are often not efficient or effective. Appropriate SE solutions must account for the salient characteristics of the computational science development environment. To identify these solutions, members of the SE community must interact with members of the computational science community. This presentation will discuss the findings from a series of case studies of CSE projects and the results of an ongoing workshop series. First, a series of case studies of computational science projects were conducted as part of the DARPA High Productivity Computing Systems (HPCS) project. The main goal of these studies was to understand how SE principles were and were not being applied in computational science along with some of the reasons why. The studies resulted in nine lessons learned about computational science software that are important to consider moving forward. Second, the Software Engineering for Computational Science and Engineering workshop brings together software engineers and computational scientists. The outcomes of this workshop series provide interesting insight into potential future trends.
June 12, 2013  HansWerner van Wyk: Multilevel Quadrature Methods
Stochastic Sampling methods are arguably the most direct and least intrusive means of incorporating parametric uncertainty into numerical simulations of partial differential equations with random inputs. However, to achieve an overall error that is within a desired tolerance, a large number of sample simulations may be required (to control the sampling error), each of which may need to be run at high levels of spatial fidelity (to control the spatial error). Multilevel methods aim to achieve the same accuracy as traditional sampling methods, but at a reduced computational cost, through the use of a hierarchy of spatial discretization models. Multilevel algorithms coordinate the number of samples needed at each discretization level by minimizing the computational cost, subject to a given error tolerance. They can be applied to a variety of sampling schemes, exploit nesting when available, can be implemented in parallel and can be used to inform adaptive spatial refinement strategies. We present an introduction to multilevel quadrature in the context of stochastic collocation methods, and demonstrate its effectiveness theoretically and by means of numerical examples.
June 7, 2013  Xuechen Zhang: Scibox: Cloud Facility for Sharing OnLine Data
Collaborative science demands global sharing of scientific data but it cannot leverage universally accessible cloudbased infrastructures, like DropBox, as those offer limited interfaces and inadequate levels of access bandwidth. In this talk, I will present Scibox cloud facility for online sharing scientific data. It uses standard cloud storage solutions, but offers a usage model in which high end codes can write/read data to/from the cloud via the same ADIOS APIs they already use for their I/O actions, thereby naturally coupling data generation with subsequent data analytics. Extending current ADIOS IO methods, with Scibox, data upload/download volumes are controlled via Data Reduction (DR) functions stated by end users and applied at the data source, before data is moved, with further gains in efficiency obtained by combining DRfunctions to move exactly what is needed by current data consumers.
June 6, 2013  Yuan Tian: Taming Scientific Big Data with Flexible Organizations for Exascale Computing
The fast growing High Performance Computing systems enable scientists to simulate scientific processes with great complexities and consequently, often producing complex data that are also exponentially increasing in size. However, the growth within the computing infrastructure is significantly imbalanced. The dramatically increasing computing power is accompanied with a slowly improving storage system. Such discordant progress among computing power, storage, and data, has led to a severe Input/Output (I/O) bottleneck that requires novel techniques to address big data challenges in the scientific domain.
This talk will identify the prevalent characteristics of scientific data and storage system as a whole, and explore opportunities to drive I/O performance for petascale computing and prepare it for the exascale. To this end, a set of flexible data organization and management techniques are introduced and evaluated to address the aforementioned concerns. Four key techniques are designed to exploit the capability of the backend storage system for processing and storing scientific big data with a fast and scalable I/O performance, visualization space filling curvebased data reorganization, systemaware chunking, spatial and temporal aggregation, and innode staging with compression. The experimental results demonstrated more than 60x speedup for a mission critical climate application during data postprocessing.
May 31, 2013  Pablo Seleson: Multiscale Material Modeling with Peridynamics
Multiscale modeling has been recognized in recent years as an important research field to achieve feasible and accurate predictions of complex systems. Peridynamics, a nonlocal reformulation of continuum mechanics based on integral equations, is able to resolve microscale phenomena at the continuum level. As a nonlocal model, peridynamics possesses a length scale which can be controlled for multiscale modeling. For instance, classical elasticity has been presented as a limiting case of a peridynamic model. In this talk, I will introduce the peridynamics theory and show analytical and numerical connections of peridynamics to molecular dynamics and classical elasticity. I will also present multiscale methods to concurrently couple peridynamics and classical elasticity, demonstrating the capabilities of peridynamics towards multiscale material modeling.
Dr. Seleson is a Postdoctoral Fellow in the Institute for Computational Engineering and Sciences at The University of Texas at Austin. He has obtained his Ph.D. in Computational Science from Florida State University in 2010. He holds a M.S. degree in Physics from the Hebrew University of Jerusalem (2006), and a double B.S. degree in Physics and Philosophy also from the Hebrew University of Jerusalem (2002).
May 29, 2013  Ryan McMahan: The Effects of System Fidelity for Virtual Reality Applications
Virtual reality (VR) has developed from Ivan Sutherland's inception of an "ultimate display" to a realized field of advanced technologies. Despite evidence supporting the use of VR for various benefits, the level of system fidelity required for such benefits is often unknown. Modern VR systems range from highfidelity simulators that incorporate many technologies to lowerfidelity, desktopbased virtual environments. In order to identify the level of system fidelity required for certain beneficial uses, research has been conducted to better understand the effects of system fidelity on the user. In this talk, a series of experiments evaluating the effects of interaction fidelity and display fidelity will be presented. Future directions of system fidelity research will also be discussed.
Dr. Ryan P. McMahan is an Assistant Professor of Computer Science at the University of Texas at Dallas, where his research focuses on the effects of system fidelity for virtual reality (VR) applications. Using an immersive VR system comprised of a wireless headmounted display (HMD), a realtime motion tracking system, and Wii Remotes as 3D input devices, his research determines the effects of system fidelity by varying components such as stereoscopy, field of view, and degrees of freedom for interactions. Currently, he is using this methodology to investigate the effects of fidelity on learning for VR training applications. Dr. McMahan received his Ph.D. in Computer Science in 2011 from Virginia Tech, where he also received his B.S. and M.S. in Computer Science in 2004 and 2007.
May 28, 2013  Adrian Sandu: Data Assimilation and the Adaptive Solution of Inverse Problems
The task of providing an optimal analysis of the state of the atmosphere requires the development of novel computational tools that facilitate an efficient integration of observational data into models. In this talk, we will introduce variational and statistical estimation approaches to data assimilation. We will discuss important computational aspects including the construction of efficient models for background errors, the construction and analysis of discrete adjoint models, new approaches to estimate the information content of observations, and hybrid variationalensemble approaches to assimilation. We will also present some recent results on the solution of inverse problems using space and time adaptivity, and a priori and a posteriori error estimates for the optimal solution.
May 24, 2013  Satoshi Matsuoka: The Futures of Tsubame Supercomputer and the Japanese HPCI Towards Exascale
HPCI is the Japanese High Performance Computer Infrastructure, which encompasses the national operations of major supercomputers, such as the K supercomputer and Tsubame2.0, much like the XSEDE in the United States and PRACE in Europe. Recently it was announced that the Japanese Ministry of Education, Culture, Sports, Science and Technology is intending to initiate a project towards an exascale supercomputer to be deployed around 2020. However, the workshop report that recommend the project also calls out for a comprehensive infrastructure where a flagship machine will be supplemented with leadership machines to complement the abilities of the flagship. Although it is still early, I will attempt to discuss the current status of Tsubame2.0 evolution to 2.5 and 3.0 in this context, as well as the activities in Japan to initiate an exascale effort, with collaborative elements with the US Department of Energy partners in system software development.
May 17, 2013  Jon Mietling and Tony McCrary: Bling3D: a new game development toolset from l33t Labs
Bling3D is a forthcoming game development toolset from l33t labs.
The fusion of Eclipse 4 with game development technologies, Bling allows both programmers and designers to create compelling interactive experiences from within one powerful tool.
In this talk, you will be introduced to some of Bling's exciting features, including:
 GPU Powered UI  A revolutionary new user interface for Eclipse, which uses shader programs to render widgets directly on the GPU.
 BYOE (Bring Your Own Engine)  Bling is designed as a universal tools platform for game technologies. You can use our game engine or integrate your own!
 Ultimate Toolset  Use the power of Bling's interface and Eclipse's extensibility to create mind blowing tools and plugins.
 Designers Love It  Intuitive visual tools that allow you to create new worlds and artificial realities with ease.
 Transform Your Assets  Easily create new ways to process raw assets (geometry, images, etc) into materials suitable for runtime use.
Jon Mietling and Tony McCrary are representatives of l33t labs LLC, technology startup from the Detroit, Michigan region.
May 10, 2013  Xiao Chen: A Modular Uncertainty Quantification Framework for Multiphysics Systems
This talk presents a modular uncertainty quantification (UQ) methodology for multiphysics applications in which each physics module can be independently embedded with its internal UQ method (intrusive or nonintrusive). This methodology offers the advantage of "plugandplay" flexibility (i.e., UQ enhancements to one module do not require updates to the other modules) without losing the "global" uncertainty propagation property. (This means that, by performing UQ in this modular manner, all intermodule uncertainty and sensitivity information is preserved.) In addition, using this methodology one can also track the evolution of global uncertainties and sensitivities at the grid point level, which may be useful for model improvement. We demonstrate the utility of such a framework for error management and Bayesian inference on a practical application involving a multispecies flow and reactive transport in randomly heterogeneous porous media.
May 2, 2013  Kenley Pelzer: Quantum Biology: Elucidating Design Principles from Photosynthesis
Recent experiments suggest that quantum mechanical effects may play a role in the efficiency of photosynthetic light harvesting. However, much controversy exists about the interpretation of these experiments, in which light harvesting complexes are excited by a fem to second laser pulse. The coherence in such laser pulses raises the important question of whether these quantum mechanical effects are significant in biological systems excited by incoherent light from the sun. In our work, we apply frequencydomain Green's function analysis to model a lightharvesting complex excited by incoherent light. By modeling incoherent excitation, we demonstrate that the evidence of longlived quantum mechanical effects is not purely an artifact of peculiarities of the spectroscopy. This data provides a new perspective on the role of noisy biological environments in promoting or destroying quantum transport in photosynthesis.
April 23, 2013  Kirk W. Cameron: PowerPerformance Modeling, Analyses and Challenges
The power consumption of supercomputers ultimately limits their performance. The current challenge is not whether we will can build an exaflop system by 2018, but whether we can do it in less than 20 megawatts. The SCAPE Laboratory at Virginia Tech has been studying the tradeoffs between performance and power for over a decade. We've developed an extensive tool chain for monitoring and managing power and performance in supercomputers. We will discuss our powerperformance modeling efforts and the implications of our findings for exascale systems as well as some research directions ripe for innovation.
April 23, 2013  Jordan Deyton: Tor Bridge Distribution Powered by Threshold RSA
Since its inception, Tor has offered anonymity for internet users around the world. Tor now offers bridges to help users evade internet censorship, but the primary distribution schemes that provide bridges to users in need have come under attack. This talk explores how threshold RSA can help strengthen Tor's infrastructure while also enabling more powerful bridge distribution schemes. We implement a basic threshold RSA signature system for the bridge authority and a reputationbased social network design for bridge distribution. Experimental results are obtained showing the possibility of quick responses to requests from honest users while maintaining both the secrecy and the anonymity of registered clients and bridges.
April 19, 2013  Maria Avramova and Kostadin Ivanov: OECD LWR UAM and PSBT/BFBT benchmarks and their relation to Advanced LWR Simulations
From 1987 to 1995, Nuclear Power Engineering Corporation (NUPEC) in Japan performed a series of void measurement tests using fullsize mockup tests for both BWRs and PWRs. Void fraction measurements and departure from nucleate boiling (DNB) tests were performed at NUPEC under steadystate and transient conditions. The workshop will provide overview of the OECD/NEA/NRC PWR Subchannel and Bundle Tests (PSBT) and OECD/NEA/NRC BWR Fullsize Finemesh Bundle Tests (BFBT) benchmarks based on the NUPEC data. The benchmarks were designed to provide a data set for evaluation of the abilities of existing subchannel, system, and computational fluid dynamics (CFD) thermalhydraulics codes to predict void distribution and departure from nucleate boiling (DNB) in LWRs under steadystate and transient conditions. The first part of the seminar summarizes the description of PSBT and BFBT benchmark databases, specifications, definition of benchmark exercises and comparative analysis of obtained results and makes the case on how these benchmarks can be used for verification, validation and uncertainty quantification of thermalhydraulic tools developed for advanced LWR simulations.
The second part of the seminar will provide overview of the OECD/NEA benchmark for LWR Uncertainty Analysis in Modeling (UAM) with emphasis on the Exercises of Phase I and Phase II of the benchmark and discussion of the Phase III, which is directly related to coupled multiphysics advanced LWR simulations. Series of welldefined problems with complete sets of input specifications and reference experimental data will be introduced with an objective is to determine the uncertainty in LWR calculations at all stages of coupled reactor physics/thermal hydraulics calculation. The full chain of uncertainty propagation will be discussed starting form from basic data and engineering uncertainties, across different scales (multiscale), and physics phenomena (multiphysics) as well as how this propagation is tested on a number of benchmark exercises. Input, output and assumptions for each Exercise will be given as well as the procedures to calculate the output and propagated uncertainties in each step will be described supplemented by results of benchmark participants.
Bio of Dr. Maria Avramova
Dr. Maria Avramova is an Assistant Professor in the Mechanical and Nuclear Engineering Department at the Pennsylvania State University. She is currently the Director of Reactor Dynamics and Fuel Management Group (RDFMG). Her expertise and experience is in the area of developing methods and computer codes for multidimensional reactor core analysis. Her background includes development, verification, and validation of thermalhydraulics subchannel, porous media, and CFD models and codes for reactor core design, transient, and safety computational analysis. She has led and coordinated the OECD/NRC BFBT and PSBT benchmarks and currently is coordinating Phase II of the OECD LWR UAM benchmark. Her latest research efforts have been focused on highfidelity multiphysics simulations (involving coupling of reactor physics, thermalhydraulics and fuel performance models) as well as on uncertainty and sensitivity analysis of reactor design and safety calculations. Dr. Avramova has published over 15 refereed journal papers and over 40 refereed conference proceedings articles.
Bio of Dr. Kostadin Ivanov
Dr. Kostadin Ivanov is Distinguished Professor in the Mechanical and Nuclear Engineering Department at the Pennsylvania State University. He is currently Graduate Coordinator of Nuclear Engineering Program. His research developments include computational methods, numerical algorithms and iterative techniques, nuclear fuel management and reloading optimization techniques, reactor kinetics and core dynamics methods, crosssection generation and modeling algorithms for multidimensional steadystate and transient reactor calculations, and coupling threedimensional (3D) kinetics models with thermalhydraulic codes. He has also led the development of multidimensional neutronics, incore fuel management and coupled 3D kinetics/thermalhydraulic computer code benchmarks, multidimensional reactor transient and safety analysis methodologies as well as integrated analysis of safetyrelated parameters, system transient modeling of power plants, and incore fuel management analyses.
Examples of such benchmarks are OECD/NRC PWR MSLB benchmark, OECD/NRC BWR TT benchmark and OECD/DOE/CEA VVER1000 CT benchmark. He is currently a chair and coordinator of the Scientific Board and Technical Program Committee of OECD LWR UAM benchmark.
April 18, 2013  Sparsh Mittal: MASTER: A Technique for Improving Energy Efficiency of Caches in Multicore Processors
Large power consumption of modern processors has been identified as the most severe constraint in scaling their performance. Further, in recent CMOS technology generations, leakage energy has been dramatically increasing and hence, the leakage energy consumption of large lastlevel caches (LLCs) has become a significant source of the processor power consumption.
This talk first highlights the need of power management in LLCs in the modern multicore processors and then presents MASTER, a microarchitectural cache leakage energy saving technique using dynamic cache reconfiguration. MASTER uses dynamic profiling of LLCs to predict energy consumption of running programs at multiple LLC sizes. Using these estimates, suitable cache quotas are allocated to different programs using cachecoloring scheme and the unused LLC space is turned off to save energy. The implementation overhead of MASTER is small and even for 4 core systems; its overhead is only 0.8% of L2 cache size. Simulations have been performed using an outoforder x8664 simulator and 2core and 4core multiprogrammed workloads from SPEC2006 suite. Further, MASTER has been compared with two energy saving techniques, namely decay cache and wayadaptable cache. The results show that MASTER gives the highest saving in energy and does not harm performance or cause unfairness.
Finally, this talk briefly shows an extension of MASTER for multicore QoS systems. Simulation results confirm that a large amount of energy is saved while meeting the QoS requirement of most of the workloads.
April 17, 2013  Okwan Kwon: Automatic Scaling of OpenMP Applications Beyond Shared Memory
We present the first fully automated compilerruntime system that successfully translates and executes OpenMP sharedaddressspace programs on laboratorysize clusters, for the complete set of regular, repetitive applications in the NAS Parallel Benchmarks. We introduce a hybrid compilerruntime translation scheme. This scheme features a novel runtime data flow analysis and compiler techniques for improving data affinity and reducing communication costs. We present and discuss the performance of our translated programs, and compare them with the performance of the MPI, HPF and UPC versions of the benchmarks. The results show that our translated programs achieve 75% of the handcoded MPI programs, on average.
April 17, 2013  Michael S. Murillo: Molecular Dynamics Simulations of Charged Particle Transport in High EnergyDensity Matter
High energydensity matter is now routinely produced at large laser facilities. Producing fusion energy at such facilities challenges our ability to model collisional plasma processes that transport energy among the plasma species and across spatial scales. While the most accurate computational method for describing collisional processes is molecular dynamics, there are numerous challenges associated with using molecular dynamics to model very hot plasmas. However, recent advances in high performance computing have allowed us to develop methods for simulating a wide variety of processes in hot, dense plasmas. I will review these developments and describe our recent results that involve simulating fast particle stopping in dense plasmas. Using the simulation results, implications for theoretical modeling of chargedparticle stopping will be given.
April 12, 2013  Vivek K. Pallipuram: Exploring Multiple Levels Of Performance Modeling For Heterogeneous Systems
One of the major challenges faced by the HighPerformance Computing (HPC) community today is userfriendly and accurate heterogeneous performance modeling. Although performance prediction models exist to finetune applications, they are seldom easytouse and do not address multiple levels of design space abstraction. Our research aims to bridge the gap between reliable performance model selection and userfriendly analysis. We propose a straightforward and accurate multilevel performance modeling suite for multiGPGPU systems that addresses multiple levels of design space abstraction. The multilevel performance modeling suite primarily targets synchronous iterative algorithms (SIAs) using our synchronous iterative GPGPU execution (SIGE) model and addresses two levels of design space abstraction: 1) lowlevel where partial details of the implementation are present along with system specifications and 2) highlevel where implementation details are minimum and only highlevel system specifications are known. The lowlevel abstraction of the modeling suite employs statistical techniques for runtime prediction, whereas the highlevel abstraction utilizes existing analytical and quantitative modeling tools to predict the application runtime. Our initial validation efforts for the lowlevel abstraction yield high runtime prediction accuracy with less than 10% error rate for several tested GPGPU cluster configurations and case studies. The development of highlevel abstraction models is underway. The end goal of our research is to offer the scientific community, a reliable and userfriendly performance prediction framework that allows them to optimally select a performance prediction strategy for the given design goals and system architecture characteristics.
April 11, 2013  Jeff Young: Commodity Global Address Spaces  How Can We Scale Out Accelerator and Memory Performance for Tomorrow's Clusters?
Current Top 500 systems like Titan, Stampede, and Tianhe1A have started to embrace the use of offchip accelerators, such as GPUs and x86 coprocessors, to dramatically improve their overall performance and efficiency numbers. At the same time, these systems also make very specific assumptions about the availability of highly optimized interconnects and software stacks that are used to mitigate the effects of running large applications across multiple nodes and their accelerators. This talk focuses on the gap in networking between highperformance computing clusters and data centers and proposes that future clusters should be built around commoditybased networks and managed global address spaces to improve the performance of data movement between host memory and accelerator memory. This thesis is supported by previous research into converged commodity interconnects and ongoing research on the Oncilla managed GAS runtime to support aggregated memory for data warehousing applications. In addition, we will speculate on how commoditybased networks and memory management for clusters of accelerators might be affected by the advent of 3D stacking and fused CPU/GPU architectures.
April 9, 2013  Cong Liu: Towards Efficient RealTime Multicore Computing Systems
Current trends in multicore computing are towards building more powerful, intelligent, yet space and powerefficient systems. A key requirement in correctly building such intelligent systems is to ensure realtime performance, i.e., "make the right move at the right time in a predictable manner." Current research on realtime multicore computing has been limited to simple systems for which complex application runtime behaviors are ignored; this limits the practical applicability of such research. In practice, complex but realistic application runtime behaviors often exist, such as I/O operations, data communications, parallel execution segments, critical sections etc. Such runtime behaviors are currently dealt with by overprovisioning systems, which is an economically wasteful practice. I will present predictable realtime multicore computing system design, analysis, and implementation methods that can efficiently support common types of application runtime behaviors. I will show that the proposed methods are able to avoid overprovisioning systems and to reduce the number of needed hardware components to the extent possible while providing timing correctness guarantees.
In the second part of the talk, I will present energyefficient workload mapping techniques for heterogeneous multicore CPU/GPU systems. Through both algorithmic analysis and prototype system implementation, I will show that the proposed techniques are able to achieve better energy efficiency while guaranteeing response time performance.
April 9, 2013  Frank Mueller: On Determining a Viable Path to Resilience at Exascale
Exascale computing is projected to feature billion core parallelism. At such large processor counts, faults will become more common place. Current techniques to tolerate faults focus on reactive schemes for recovery and generally rely on a simple checkpoint/restart mechanism. Yet, they have a number of shortcomings. (1) They do not scale and require complete job restarts. (2) Projections indicate that the meantimebetweenfailures is approaching the overhead required for checkpointing. (3) Existing approaches are applicationcentric, which increases the burden on application programmers and reduces portability.
To address these problems, we discuss a number of techniques and their level of maturity (or lack thereof) to address these problems. These include (a) scalable network overlays, (b) onthefly process recovery, (c) proactive processlevel fault tolerance, (d) redundant execution, (e) the effort of SDCs on IEEE floating point arithmetic and (f) resilience modeling. In combination, these methods are aimed to pave the path to exascale computing.
April 5, 2013  Sarat Sreepathi: Optimus: A Parallel Metaheuristic Optimization Framework With Environmental Engineering Applications
Optimus (Optimization Methods for Universal Simulators) is a parallel optimization framework for coupling computational intelligence methods with a target scientific application. Optimus includes a parallel middleware component, PRIME (Parallel Reconfigurable Iterative Middleware Engine) for scalable deployment on emergent supercomputing architectures. PRIME provides a lightweight communication layer to facilitate periodic interoptimizer data exchanges. A parallel search method, COMSO (Cooperative MultiSwarm Optimization) was designed and tested on various high dimensional mathematical benchmark problems. Additionally, this work presents a novel technique, TAPSO (Topology Aware Particle Swarm Optimization) for network based optimization problems. Empirical studies demonstrate that TAPSO achieves better convergence than standard PSO for Water Distribution Systems (WDS) applications. Scalability analysis of Optimus was performed on the Cray XK6 supercomputer (Jaguar) at Oak Ridge Leadership Computing Facility for the leak detection problem in WDS. For a weak scaling scenario, we achieved 84.82% of baseline at 200,000 cores relative to performance at 1000 cores.
March 20, 2013  J.W. Banks: Stable Partitioned Solvers for Compresible Uidstructure Interaction Problems
In this talk, we discuss recent work concerning the developing and analysis of stable, partitioned solvers for uidstructure interaction problems. In a partitioned approach, the solvers for each uid or solid domain are isolated from each other and coupled only through the interface. This is in contrast to fullycoupled monolithic schemes where the entire system is advanced by a single unied solver, typically by an implicit method. Addedmass instabilities, common to partitioned schemes, are addressed through the use of a newly developed interface projection technique. The overall approach is based on imposing the exact solution to local uidsolid Riemann problems directly in the numerical method. Stability of the FSI coupling is discussed using normalmode stability theory, and the new scheme is shown to be stable for a wide range of material parameters. For the rigid body case, the approach is shown to be stable even for bodies of no mass or rotational inertia. This dicult limiting case exposes interesting subtleties concerning the notion of added mass in uidstructure problems at the continuous level.
March 13, 2013  Travis Thompson: NavierStokes equations to Describe the Motion of Fluid Substances
The NavierStokes equations describe the motion of fluid substances; the equations are widely utilized to model many physical phenomena such as weather patterns, ocean currents, turbulent fluid flow and magnetohydrodynamics. Despite their wide utilization a comprehensive theoretical understanding remains an open question; the equations offer a venue for challenges at the forefront of both theoretical and computational knowledge. My work at Texas A&M has focused, primarily, on two topics: aspects of hyperbolic conservation laws, specifically mass conservation for incompressible NavierStokes, and computational investigation of an LES model based on a new eddyviscosity; both embody appeal to highlyparallel scientific computing albeit in differing ways.
With respect to hyperbolic conservation laws: on the computational side I have implemented a onestep artificial compression term in a numerical code which counteracts an entropyviscosity regularization term. This is an innovative approach; canonical methods for interface tracking are twostep or adaptive procedures. In addition the implementation utilizes a splitting approach, originally designed for use in a highlyparallel momentum equation variant, as an approximation operator in the timestepping scheme; this approach imbues the algorithm with additional parallelism. On the theoretical side a distinct approach towards the analysis of dispersion error, utilizing a commutator expression, has been investigated for particular finite element spaces; the approach offers a computational segue into investigating consistency error and moves away from the canonical, tedious, expansionbased methodology of analysis.
With respect to large eddy simulations (LES): Computational investigations of an eddyviscosity model based on the entropyviscosity of Guermond & Popov has been underway for the last six months; in collaboration with Dr. Larios, a postdoc here at Texas A&M, an analysis of the qualitative and statistical attributes of high Reynolds number, turbulent flow is being conducted. We will compare our results to the SmagorinskyLilly turbulence model and attempt to verify basic tenets of isotropic turbulence theory; namely the Kolmogorov – 5/3 law and predictions regarding the uncorrelated nature of velocity structure functions.
March 1, 2013  Bob Salko: Development, Improvement, and Validation of Reactor ThermalHydraulic Analysis Tools
As a result of the need for continual development, qualification, and application of computational tools relating to the modeling of nuclear systems, the Reactor Dynamics and Fuel Management Group (RDFMG) at the Pennsylvania State University has maintained an active involvement in this area. This presentation will highlight recent RDFMG work relating to thermalhydraulic modeling tools. One such tool is the COolant Boiling in Rod Arrays  Two Fluids (COBRATF) computer code, capable of modeling the independent behavior of continuous liquid, vapor, and droplets using the subchannel methodology. Work has been done to expand the modeling capabilities from the invessel region only, which COBRATF has been developed for, to the coolantline region by developing a dedicated coolantlineanalysis package that serves as an addon to COBRATF. Additional COBRATF work includes development of a preprocessing tool for faster, more userfriendly creation of COBRATF input decks, implementation of postprocessing capabilities for visualization of simulation results, and optimization of the source code for significant improvements in simulation speed and memory management. Of equal importance to these development activities is the validation of the resulting tools for their intended applications. The code capability to capture rodbundle thermalhydraulic behavior during prototypical PWR operating conditions will be demonstrated through comparison of predicted and experimental results for the New Experimental Studies of ThermalHydraulics of Rod Bundles (NESTOR) tests. Due to the growing usage of Computational Fluids Dynamics (CFD) tools in this area, modeling results predicted by the STARCCM+ CFD tool will also be presented for these tests.
February 23, 2013  Thomas L. Lewis: Finite Difference and Discontinuous Galerkin Numerical Methods for Fully Nonlinear Second Order PDEs with Applications to Stochastic Optimal Control
In this talk I will discuss a convergence framework for directly approximating the viscosity solutions of fully nonlinear second order PDE problems. The main focus will be the introduction of a set of sufficient conditions for constructing convergent finite difference (FD) methods. The conditions given are meant to be easier to realize and implement than those found in the current literature. The given FD methodology will then be shown to generalize to a class of discontinuous Galerkin (DG) methods. The proposed DG methods are high order and allow for increased flexibility when choosing a computational mesh. Numerical experiments will be presented to gauge the performance of the proposed DG methods. An overview of the PDE theory of viscosity solutions will also be given. The presented ideas are part of a larger project concerned with efficiently and accurately approximating the HamiltonJacobiBellman equation from stochastic optimal control.
February 22, 2013  Charles K. Garrett: Numerical Integration of Matrix Riccati Differential Equations with Solution Singularities
A matrix Riccati differential equation (MRDE) is a quadratic ODE of the form
X' = A21 + A22X – XA11 – XA12X.
It is well known that MRDEs may have singularities in their solution. In this presentation, both the theory and practice of numerically integrating MRDEs past solution singularities will be analyzed. In particular, it will be shown how to create a black box numerical MRDE solver, which accurately solves an MRDE with or without singularities.
February 21, 2013  Giacomo Dimarco: Asymptotic Preserving ImplicitExplicit RungeKutta Methods For NonLinear Kinetic Equations
In this talk, we will discuss ImplicitExplicit (IMEX) Runge Kutta methods which are particularly adapted to stiff kinetic equations of Boltzmann type. We will consider both the case of easy invertible collision operators and the challenging case of Boltzmann collision operators. We give sufficient conditions in order that such methods are asymptotic preserving and asymptotically accurate. Their monotonicity properties are also studied. In the case of the Boltzmann operator the methods are based on the introduction of a penalization technique for the collision integral. This reformulation of the collision operator permits to construct penalized IMEX schemes which work uniformly for a wide range of relaxation times avoiding the expensive implicit resolution of the collision operator. Finally we show some numerical results which confirm the theoretical analysis.
February 20, 2013  Tom Berlijn: Effects of Disorder on the Electronic Structure of Functional Materials
Doping is one of the most powerful ways to tune the properties of functional materials such as thermoelectrics, photovoltaics and superconductors. Besides carriers and chemical pressure, the dopants insert disorder into the materials. In this talk I will present two case studies of doped Fe based superconductors: Fe vacancies in KxFeySe2 [1] and Ru substitutions in Ba(Fe1xRux)2As2 [2]. With the use of a recently developed first principles method [3], nontrivial disorder effects are found that are not only interesting scientifically, but also have potential implications for materials technology. Open questions for further research will be discussed.
[1] TB, P.j. Hirschfeld, W. Ku, PRL 109 (2012)
[2] L. Wang, TB, C.H. Lin, Y. Wang, P.j. Hirschfeld, W. Ku, PRL 110 (2013)
[3] TB, D. Volja, W. Ku, PRL 106 (2011)
February 19, 2013  Joshua D. Carmichael: Seismic Monitoring of the Western Greenland Ice Sheet: Response to Early Lake Drainage
In 2006, the drainage of a supraglacial lake through hydrofracture on the Greenland Icesheet was directly observed for the first time. This event demonstrated that surfacetobed hydrological connections can be established through 1km of cold ice and thereby allow surficial forcing of a developed subglacial drainage system by surface meltwater. In a climate changing scenario, supraglacial lakes on the Western Greenland Ice Sheet are expected to drain earlier each summer and form new lakes at higher elevations. The ice sheet response to these earlier drainages in the near future is of glaciological concern. We address the response of the Western Greenland Ice Sheet to an observed early lake drainage using a synthesis of seismic and GPS monitoring near an actively draining lake. This experiment demonstrates that (1) seismic activity precedes the drainage event by several days and is likely coincident with crack coalescence, that (2) seismic multiplet locations are coincident with the uplift of the ice during drainage and (3) a diurnal seismic response of the ice sheet follows after the ice surface settles to predrainage elevation a week later. These observations are consistent with a model in which the subglacial drainage system is likely distributed, highly pressurized and with low hydraulic conductivity at drainage initiation. It also demonstrates that an early lake drainage likely reduces basal normal stress for orderweek time scales by storing water subglacially. We conclude with recommendations for future longrange lake drainage detection.
February 18, 2013  Mili Shah: Calculating a Symmetry Preserving Singular Value Decomposition
The symmetry preserving singular value decomposition (SPSVD) produces the best symmetric (low rank) approximation to a set of data. These symmetric approximations are characterized via an invariance under the action of a symmetry group on the set of data. The symmetry groups of interest consist of all the nonspherical symmetry groups in three dimensions. This set includes the rotational, reflectional, dihedral, and inversion symmetry groups. In order to calculate the best symmetric (low rank) approximation, the symmetry of the data set must be determined. Therefore, matrix representations for each of the nonspherical symmetry groups have been formulated. These new matrix representations lead directly to a novel reweighting iterative method to determine the symmetry of a given data set by solving a series of minimization problems. Once the symmetry of the data set is found, the best symmetric (low rank) approximation can be established by using the SPSVD. Applications of the SPSVD to protein dynamics problems as well as facial recognition will be presented.
February 14, 2013  Zheng (Cynthia) Gu: Efficient and Robust Message Passing Schemes for Remote Direct Memory Access (RDMA)Enabled Clusters
While significant effort has been made in improving Message Passing Interface (MPI) performance, existing work has mainly focused on eliminating software overhead in the library and delivering raw network performance to applications. The current MPI implementations such as MPICH2, MVAPICH2, and Open MPI still suffer from performance issues such as unnecessary synchronizations, communication progress problems, and lack of communicationcomputation overlaps. The root cause of these problems is the mismatch between the communication protocols/algorithms and the communication scenarios. In my PhD research, I will develop efficient and robust message passing schemes for both pointtopoint and collective communications for RDMAenabled clusters. Unlike existing approaches for optimizing MPI performance, our approach will allow different communication protocols/algorithms for different communication scenarios. The idea is to use the most appropriate communication scheme for each communication so as to remove the mismatches, which will eliminate unnecessary synchronizations, improve communication progress, and maximize communicationcomputation overlaps during a communication operation. This prospectus will describe the background of this research, present our preliminary research, and summarize the proposed future work.
February 8, 2013  Taylor Patterson: Simulation of Complex Nonlinear Elastic Bodies Using Lattice Deformers
Lattice deformers are a popular option in computer graphics for modeling the behavior of elastic bodies as they avoid the need for conforming mesh generation, and their regular structure offers significant opportunities for performance optimizations. This talk will present work that expands the scope of current gridbased elastic deformers, adding support for a number of important simulation features. The approach to be described accommodates complex nonlinear, optionally anisotropic materials while using an economical onepoint quadrature scheme. The formulation fully accommodates nearincompressibility by enforcing accurate nonlinear constraints, supports implicit integration for large time steps, and is not susceptible to locking or poor conditioning of the discrete equations. Additionally, this technique increases the solver accuracy by employing a novel highorder quadrature scheme on lattice cells overlapping with the embedded model boundary, which are treated at subcell precision. This accurate boundary treatment can be implemented at a minimal computational premium over the cost of a voxelaccurate discretization. Finally, this talk will present part of the expanding feature set of this approach that is currently under development.
February 6, 2013  Makhan Virdi: Modeling Highresolution Soil Moisture to Estimate Recharge Timing and Experiences with Geospatial Analyses
Estimating the time of groundwater recharge after a rainfall event is poorly understood because of it's dependence on nonlinear soil characteristics and variability in antecedent soil conditions. Movement of water in variably saturated soil can be described by Richards' equation  a nonlinear partial differential equation without a closedform analytical solution, which is difficult to approximate. To develop a simple recharge model using minimum number of soil parameters, high resolution soil moisture data from a soil column in controlled laboratory conditions were analysed to understand the wetting front propagation at a finer temporal scale. Findings from a series of simulations using an existing Finite Element model by varying soil properties and depth to water table were used to propose a simple model that uses only the most significant representative soil properties and antecedent soil matrix state. In other separate geospatial analyses, satellite imagery was used for determining landslide risk cost to develop an algorithm for safest and shortest route planning in hilly areas susceptible to landslide; effects of decadal climate extremes was studied on lakegroundwater exchanges; Effects of Phosphate mining on a regional scale were studied using hydrological models and geospatial analysis LiDAR derived DEM and watershed.
February 5, 2013  Roshan J. Vengazhiyil and C. F. Jeff Wu: Experimental Design, Model Calibration, and Uncertainty Quantification
We will start the talk with a newly developed spacefilling design, called minimum energy design (MED). The key ideas involved in constructing the MED are the visualization of each design point as a charged particle inside a box, and minimization of the total potential energy of these particles. It is shown through theoretical arguments and simulations, that under regularity conditions and proper choice of the charge function, the MED can asymptotically generate any arbitrary probability density function. This new design technique has important applications in Bayesian computation and uncertainty quantification. The second part of the talk will focus on model calibration. The commonly used Kennedy and O'Hagan's (KO) approach treats the computer model as a black box and therefore, the statistically calibrated models lack physical interpretability. We propose a new framework that opens up the black box and introduces statistical models inside the computer model. This approach leads to simpler models that are physically more interpretable. Then, we will present some theoretical results concerning the convergence properties of calibration parameter estimation in the KO formulation of the model calibration problem. The KO calibration is shown to be asymptotically inconsistent. A new approach, called L2 distance calibration, is shown to be consistent and asymptotically efficient in estimating the calibration parameters.
February 4, 2013  LiShi Luo: Kinetic Methods for CFD
Computational fluid dynamics (CFD) is based on direct discretizations of the NavierStokes equations. The traditional approach of CFD is now being challenged as new multiscale and multiphysics problems have begun to emerge in many fields  in nanoscale systems, the scale separation assumption does not hold; macroscopic theory is therefore inadequate, yet microscopic theory may be impractical because it requires computational capabilities far beyond our present reach. Methods based on mesoscopic theories, which connect the microscopic and macroscopic descriptions of the dynamics, provide a promising approach. Besides their connection to microscopic physics, kinetic methods also have certain numerical advantages due to the linearity of the advection term in the Boltzmann equation. Dr. Luo will discuss two mesoscopic methods: the lattice Boltzmann equation and the gaskinetic scheme, their mathematical theory and their applications to simulate various complex flows. Examples include incompressible homogeneous isotropic turbulence, hypersonic flows, and microflows.
January 23, 2013  Tarek Ali El Moselhy: New Tools for Uncertainty Quantification and Data Assimilation in Complex Systems
In this talk, Dr. Tarek Ali El Moselhy will present new tools for forward and inverse uncertainty quantification (UQ) and data assimilation.
In the context of forward UQ, Dr. Moselhy will briefly summarize a new scalable algorithm particularly suited for very highdimensional stochastic elliptic and parabolic PDEs. The algorithm relies on computing a compact separated representation of the stochastic field of interest. The separated presentation is computed iteratively and adaptively via a greedy optimization algorithm. The algorithm has been successfully applied to problems of flow and transport in stochastic porous media, handling “real world” levels of spatial complexity and providing orders of magnitude reduction in computational time compared to state of the art methods.
In the context of inverse UQ, Dr. Moselhy will present a new algorithm for the Bayesian solution of inverse problems. The algorithm explores the posterior distribution by finding a {\it transport map} from a reference measure to the posterior measure, and therefore does not require any Markov chain Monte Carlo sampling. The map from the reference to the posterior is approximated using polynomial chaos expansion and is computed via stochastic optimization. Existence and uniqueness of the map are guaranteed by results from the optimal transport literature. The map approach is demonstrated on a variety of problems, ranging from inference of permeability fields in elliptic PDEs to benchmark highdimensional spatial statistics problems such as inference in logGaussian cox point processes.
In addition to its computational efficiency and parallelizability, advantages of the map approach include: providing clear convergence criteria and error measures, providing analytical expressions for posterior moments, evaluating at no additional computational cost the marginal likelihood/evidence (thus enabling model selection), the ability to generate independent uniformlyweighted posterior samples without additional model evaluations, and the ability to efficiently propagate posterior information to subsequent computational modules (thus enabling stochastic control).
In the context of data assimilation, Dr. Moselhy will present an optimal map algorithm for filtering of nonlinear chaotic dynamical systems. Such an algorithm is suited for a wide variety of applications including prediction of weather and climate. The main advantage of the algorithm is that it inherently avoids issues of sample impoverishment common to particle filters, since it explicitly represents the posterior as the push forward of a reference measure rather than with a set of samples.
December 13, 2012  Russell Carden: Automating and Stabilizing the Discrete Empirical Interpolation Method for Nonlinear Model Reduction
The Discrete Empirical Interpolation Method (DEIM) is a technique for model reduction ofnonlinear dynamical systems. It is based upon a modification to proper orthogonal decomposition, which is designed to reduce the computational complexity for evaluating the reduced order nonlinear term. The DEIM approach is based upon an interpolatory projection and only requires evaluation of a few selected components of the original nonlinear term. Thus, implementation of the reduced order nonlinear term requires a new code to be derived from the original code for evaluating the nonlinearity. Dr. Carden will describe a methodology for automatically deriving a code for the reduced order nonlinearity directly from the original nonlinear code. Although DEIM has been effective on some very difficult problems, it can under certain conditions introduce instabilities in the reduced model. Dr. Carden will present a problem that has proved helpful in developing a method for stabilizing DEIM reduced models.
December 12, 2012  Charlotte Kotas: Bringing RealTime Array Signal Processing to the NVIDIA Tesla
Underwater acoustic detection of hostile targets at range requires increasingly computationally advanced algorithms as adversaries become quieter. This seminar will discuss the mathematics behind one such algorithm and some of the challenges associated with modifying it to work in a realtime networked environment. The algorithm was modified from a sequential MATLAB formulation to a parallel CUDA FORTRAN formation designed to run on an NVIDIA Tesla C2050 processor. Speedups of greater than 50◊ were observed over comparable computational sections.
December 6, 2012  Shuaiwen "Leon" Song: Power, Performance and Energy Models and Systems for Emergent Architectures
Massive parallelism combined with complex memory hierarchies and heterogeneity in highperformance computing (HPC) systems form a barrier to efficient application and architecture design. The performance achievements of the past must continue over the next decade to address the needs of scientific simulations. However, building an exascale system by 2022 that uses less than 20 megawatts will require significant innovations in power and performance efficiency. Prior to this work, the fundamental relationships between power and performance were not well understood. Our analytical modeling approach allows users to quantify the relationship between power and performance at scale by enabling study of the effects of machine and application dependent characteristics on system energy efficiency. Our model helps users isolate root causes of energy or performance inefficiencies and develop strategies for scaling systems to maintain or improve efficiency. I will also show how this methodology can be extended and applied to model power and performance in heterogeneous GPUbased architectures.
Shuaiwen "Leon" Song is a PhD candidate in the Computer Science department of Virginia Tech. His primary research interests fall broadly within the area of High Performance Computing (HPC) with a focus on power and performance analysis and modeling for large scale homogeneous and heterogeneous parallel architectures and runtime systems. He is a recipient of the 2011 Paul E. Torgersen Award for Graduate Student Research Excellence and in 2011 was an Institute for Scientific Computing Research (ISCR) Scholar at Lawrence Livermore National Laboratory. His work has been published in conferences and journals including IPDPS, IEEE Cluster, PACT, MASCOTS, IEEE TPDS, and IJHPCA.
December 6, 2012  Miroslav Stoyanov: Gradient Based Dimension Reduction Approach for Stochastic Partial Differential Equations
Dimension reduction approach is considered for uncertainty quantification, where we use gradient information to partition the uncertainty domain into “active” and “passive” subspaces, where the “passive” subspace is characterized by near zero variance of the quantity of interest. We present a way to project the model onto the low dimensional “active” subspace and solve the resulting problem using conventional techniques. We derive rigorous error bounds for the projection algorithm and show convergence in $L^1$ norm.
December 5, 2012  Barbara Chapman: Enabling Exascale Programming: The Intranode Challenge
As we continue to debate the best way to program emerging generations of leadershipclass hardware, it is imperative that we do not ignore the more traditional paths.
Dr. Chapman's presentation considers some of the ways in which today's intranode programming models may help us migrate legacy application code.
December 5, 2012  Andrew Christlieb: An Implicit Maxwell Solver Based on Method of Lines Transpose
Fast summation methods have been successfully used in a range of plasma applications. However, in the case of moving point charges, direct application of fast summation methods in the time domain requires the use of retarded potentials. In practices, this means that every time a point charge moves in a simulation, it leaves behind an image charge that becomes a source term for all time. Hence, at each time step the number of points in the simulation grows with the number of particles being simulated.
In this talk, Dr. Christlieb will present a new approach to Maxwell's equations based on the method of lines transpose. The method starts by expressing Maxwell’s equations in second order form, and then the time operator is discretized. The resulting implicit system is then solved using integral methods. This process is known as the method of lines transpose. This approach pushes the time history into a volume integral, which does not grow in complexity with time. To efficiently solve the boundary integral, Dr. Christlieb will explain the developed ADI method that is combined with a $O(N)$ solver for the 1D boundary integrals that is competitive with explicit time stepping methods. Because the new method is implicit, this approach does not have a CFL. Further, because the approach is based on an integral formulation, the new method easily encompasses complex geometry with no special modification. Dr. Christlieb will present preliminary results of this method applied to wave propagation and some basic Maxwell examples.
November 27, 2012  Charles Jackson: Metrics for Climate Model Validation
A “valid” model is a model that has been tested for its intended purpose. In the Bayesian formulation, the “loglikelihood” is a test statistic for selecting, weeding, or weighting climate model ensembles with observational data. Thisstatistic has the potential to synthesize the physical and data constraints on quantities of interest. One of the thorny issues in formulating the loglikelihood is how one should account for biases because not all biases affect predictions of quantities of interest. Dr. Jackson makes use of a 165member ensemble CAM3.1/slab ocean climate models with different parameter settings to think through the issues that are involved with predicting eachmodel’s sensitivity to greenhouse gas forcing given what can be observed from the base state. In particular, Dr. Jackson uses multivariate empirical orthogonal functions to decompose the differences that exist among this ensemble to discover what fields and regions matter to the model’s sensitivity. What is found is that the differences that matter can be a small fraction of the total discrepancy. Moreover, weighting members of the ensemble using this knowledge does a relatively poor job of adjusting the ensemble mean toward the known answer. Dr. Jackson will discuss the implications of this result.
November 15, 2012  Erich Foster: Finite Elements for the QuasiGeostrophic Equations of the Ocean
Erich Foster will present a conforming finite element (FE) discretization of the stream function formulation of the pure stream function form of the quasigeostrophic equations (QGE), which are a commonly used model for the large scale winddriven ocean circulation. The pure stream function form of the QGE is a fourthorder PDE and therefore requires a C^1 FE discretization to be conforming. Thus, the Argyris finite element, a C^1 FE with 21 degrees of freedom, was chosen for the FE discretization of the QGE. Optimal error estimates for the pure stream function form of the QGE will be presented. The QGE is a simplified model of the ocean, however it can be computationally expensive to resolve all scales, therefore numerical methods, such as the twolevel method, are indispensable for time sensitive projects. A twolevel method and optimal error estimate for a twolevel method applied to the conforming FE discretization of the pure stream function form of the QGE will be presented and computational efficiency will be demonstrated.
October 25, 2012  Shi Jin: AsymptoticPreserving Schemes for Boltzmann Equation and Relative Problems with Stiff Sources
Dr. Shi Jin will propose a general framework to design asymptotic preserving schemes for the Boltzmann kinetic and related equations. Numerically solving these equations are challenging due to the nonlinear stiff collision (source) terms induced by small mean free or relaxation time. Dr. Jin will propose to penalize the nonlinear collision term by a BGKtype relaxation term, which can be solved explicitly even if discretized implicitly in time. Moreover, the BGKtype relaxation operator helps to drive the density distribution toward the local Maxwellian, thus naturally imposes an asymptoticpreserving scheme in the Euler limit. The scheme so designed does not need any nonlinear iterative solver or the use of Wild Sum. It is uniformly stable in terms of the (possibly small) Knudsen number, and can capture the macroscopic fluid dynamic (Euler) limit even if the small scale determined by the Knudsen number is not numerically resolved. Dr. Jin will show how this idea can be applied to other collision operators; such as the LandauFokkerPlanck operator, UllenbeckUrling model, and in the kineticfluid model of disperse multiphase flows.
October 24, 2012  Shi Jin: Semiclassical Computation of High Frequency Waves in Heterogeneous Media
Dr. Shi Jin will introduce semiclassical Eulerian methods that are efficient in computing high frequency waves through heterogeneous media. The method is based on the classical Liouville equation in phase space, with discontinuous Hamiltonians due to the barriers or material interfaces. Dr. Jin will provide physically relevant interface conditions consistent with the correct transmissions and reflections, and then build the interface conditions into the numerical fluxes. This method allows the resolution of high frequency waves without numerically resolving the small wave lengths, and capture the correct transmissions and reflections at the interface. This method can also be extended to deal with diffraction and quantum barriers. Dr. Jin will also discuss Eulerian Gaussian beam formulation which can compute caustics more accurately.
October 09, 2012  Christian Ringhofer: Charged Particle Transport in Narrow Geometries under Strong Confinement
Kinetic transport in narrow tubes and thin plates, involving scattering of particles with a background, is modeled by classical and quantum mechanical subband type macroscopic equations for the density of particles (ions). The result are diffusion equation with the projection of the (asymptotically conserved) energy tensor on the confined directions as an additional free variable, on large time scales. Classical transport of ions through protein channels and quantum transport in thin films are discussed as examples of the application of this methodology.
October 05, 2012  Amilcare Porporato: Stochastic soil moisture dynamics: from soilplant biogeochemistry and landatmosphere interactions to sustainable use of soil and water
The soilplantatmosphere system is characterized by a large number of interacting processes with high degree of unpredictability and nonlinearity. These elements of complexity, while making a full modeling effort extremely daunting, are also responsible for the emergence of characteristic behaviors. Duke University model these processes by mean of minimalist models which describe the main deterministic components of the system and surrogate the high dimensional ones (i.e., hydroclimatic variability and rainfall in particular) with suitable stochastic terms. The solution of the stochastic soil water balance allows us to describe probabilistically several ecohydrological processes, including ecosystem response plant productivity as well as soil organic matter and nutrient cycling dynamics. Dr. Porporato will also discuss how such an approach can be extended to include land atmosphere feedbacks and related impact on convective precipitation. Dr. Porporato will conclude with a brief discussion of how these methods can be employed to address quantitatively the sustainable management of water and soil resources, including optimal irrigation and fertilization, phytoremediation, and soil salinization risk.