Parallel Exact Stochastic Simulator

- Initial Plan written by James M. McCollum
-Contributions by David Bauer, Nagiza Samatova, Gregory D. Peterson, and Trayvon D. Leslie
REFERENCES

TRAYVON D. LESLIE

Currently the applicability of Gillespie’s exact stochastic simulation algorithms to large scale genetic 1 regulatory network models, whole-cell models, and multi-cell models is limited by simulator performance. Although improvements to these algorithms have been made by Gibson and Bruck , the 2 simulation time required to model a single cell-cycle of E. Coli still remains on the order of years . 3 Other more efficient techniques exist for simulating complex biochemical models such as the use of ordinary differential equations, partial differential equations, approximate stochastic simulation , and 4 chemical langevin equations, but these techniques can ignore or corrupt the representation of noise (stochastic fluctuation), an emerging area of study that is believed to be a critical factor in analyzing and characterizing biochemical system behavior . To improve exact stochastic simulator performance, we 5 will attempt to parallelize Gillespie’s algorithms based on techniques that have been previous applied to parallel discrete event simulation . We will then attempt to model a large biologically-significant 6 problem on various supercomputers at Oak Ridge National Laboratory . This document describes an 7 initial plan for this research.

Prior Work
Most recently, a presentation by Tang concluded that attempting to parallelize stochastic simulation 8 does not result in improved performance, but this attempt involved only assigning one processor per reaction, which apparently resulted in large communication overhead. Schwehm also developed an 9 algorithm for parallelizing stochastic simulation specifically for spatial models, but diffusion is modeled at a high level and is non-exact. Arjunan, Takahashi, and Tomita expanded on this work to create a 10 multithreaded simulation algorithm for use on a shared memory machine, but this also included using a non-exact model for diffusion.

The First Implementation
The main goal of the first implementation will be to develop a simple and accurate parallel exact stochastic simulator. This simulator will be used as foundation for which all other implementations will be developed and will be used to verify that later implementations are correct. This simulator will also be used to identify performance bottlenecks and areas for improvement. A secondary goal of this simulator will be to achieve simulation times that outperform serial implementations of the algorithm.
For this implementation, Gillespie’s First Reaction Method will be used. Although this method has been shown to be the worst performing of the exact stochastic simulation algorithms , it is the simplest to 11 understand, easiest to code, and reasonable to verify that it is working properly.
Since we do not yet have a large biological model built to use for testing, we will use artificial models to test the performance of the simulator generated by an updated version of the existing model generator.
At startup, each processor will read a model file and a configuration file. The model file specifies the initial species populations, the reactions, and the reaction rates. The configuration file specifies what reactions will be processed by which processors. The assignment of reactions will be fixed throughout the run (reactions can not be dynamically reassigned to different processors during execution). Each node will then build a dependency graph to determine which other processors will be affected by the execution of their assigned reactions.
Each node will begin executing the First Reaction Method just as they would if they were running the sequential case. As the node executes, it will maintain a complete history of the reactions that they have executed and the time when those reactions occurred.
When a processor, P1, executes a reaction that effects the state of another processor, P2, P1 will add a message to P2's message buffer notifying the processor of the change. After placing the message in the buffer, P1 will continue executing as it would have in the sequential case.
If P2's current time is less than the message’s time, P2 will continue executing until it reaches the message and then handle the message by executing the state-changing reaction associated with the reaction.
If P2's current time is greater than the message’s time, P2 will rollback its executed reactions to the point at which the message’s reaction occurred, make the change, and begin executing from the message’s time point. As it rolls back, it must also send anti-messages to cancel the effects of messages it had previously sent to other processors. This process is closely related to Jefferson time warp and is described in chapter 6.4 of Reed and Fujimoto’s book.
Once all processors have reached the simulation end time, each processor will report its output for each species it was responsible for. The output will be collected into a single file.
Code will be added to log the actions of each processor. This information will be used to “playback” the events of the simulator, learn more about the performance of the simulator, and verify that the simulator is working properly.

Later Implementations
Through brainstorming sessions, we have developed several ideas for how to improve the performance of the initial implementation. Analyzing the performance of the initial implementation will help to determine the order in which we try these techniques. The ideas are listed below.
1. Use a faster technique than the First Reaction Method to execute each step of the algorithm such as the Next Reaction Method or the newly developed Adaptive Method.
2. Assign a few processors to strictly handle uniform and exponential random number generation.
3. Allow the software to choose the assignment of reactions to particular processors. This would replace the configuration file.
4. Create an adaptive control system which allows reactions to be reassigned to other processors as the simulation executes. Use this to try to minimize communication overhead.
5. Instead of storing the entire history of the simulation for each processor, use Global Virtual Time (GVT) algorithms to do fossil collection. Use algorithms for GVT maintenance that are specifically tuned for Shared-Memory Multi-processors.
6. Develop a real biological model for testing the simulator. Use the real model to help develop a better artificial model generator.
7. Develop methods for building the output model file as the simulation progresses.
8. When a processor is “rolled back” because it receives a message with a time earlier than its current time, save the random numbers that were generated and reuse them.
9. When a processor is “rolled back”, store the now invalid history so that if an anti-message arrives, the history can now be reused.
10. Develop an MPI version that will be portable to a larger set of machines.
11. If improving the performance of exact stochastic simulation seems impossible, look into parallelizing approximate stochastic simulation and other non-exact techniques.

Conclusion
Implementing the concepts proposed in the above document will hopefully show a significant performance improvement over using a sequential simulation algorithm. This work will also hopefully serve as another major step in increasing the applicability of exact stochastic simulation to biochemical system modeling.

BACK