- Initial Plan
written by James M. McCollum
-Contributions by David Bauer, Nagiza Samatova, Gregory D. Peterson, and Trayvon D. Leslie
REFERENCES
Currently the applicability of Gillespie’s exact stochastic simulation algorithms to large scale genetic 1 regulatory network models, whole-cell models, and multi-cell models is limited by simulator performance. Although improvements to these algorithms have been made by Gibson and Bruck , the 2 simulation time required to model a single cell-cycle of E. Coli still remains on the order of years . 3 Other more efficient techniques exist for simulating complex biochemical models such as the use of ordinary differential equations, partial differential equations, approximate stochastic simulation , and 4 chemical langevin equations, but these techniques can ignore or corrupt the representation of noise (stochastic fluctuation), an emerging area of study that is believed to be a critical factor in analyzing and characterizing biochemical system behavior . To improve exact stochastic simulator performance, we 5 will attempt to parallelize Gillespie’s algorithms based on techniques that have been previous applied to parallel discrete event simulation . We will then attempt to model a large biologically-significant 6 problem on various supercomputers at Oak Ridge National Laboratory . This document describes an 7 initial plan for this research.
Prior Work
Most recently, a presentation by Tang concluded that attempting to parallelize stochastic simulation 8
does not result in improved performance, but this attempt involved only assigning one processor per
reaction, which apparently resulted in large communication overhead. Schwehm also developed an 9
algorithm for parallelizing stochastic simulation specifically for spatial models, but diffusion is modeled at a high level and is non-exact. Arjunan, Takahashi, and Tomita expanded on this work to create a 10
multithreaded simulation algorithm for use on a shared memory machine, but this also included using a
non-exact model for diffusion.
The First Implementation
The main goal of the first implementation will be to develop a simple and accurate parallel exact
stochastic simulator. This simulator will be used as foundation for which all other implementations will
be developed and will be used to verify that later implementations are correct. This simulator will also
be used to identify performance bottlenecks and areas for improvement. A secondary goal of this
simulator will be to achieve simulation times that outperform serial implementations of the algorithm.
For this implementation, Gillespie’s First Reaction Method will be used. Although this method has been
shown to be the worst performing of the exact stochastic simulation algorithms , it is the simplest to 11
understand, easiest to code, and reasonable to verify that it is working properly.
Since we do not yet have a large biological model built to use for testing, we will use artificial models to
test the performance of the simulator generated by an updated version of the existing model generator.
At startup, each processor will read a model file and a configuration file. The model file specifies the
initial species populations, the reactions, and the reaction rates. The configuration file specifies what
reactions will be processed by which processors. The assignment of reactions will be fixed throughout
the run (reactions can not be dynamically reassigned to different processors during execution). Each
node will then build a dependency graph to determine which other processors will be affected by the
execution of their assigned reactions.
Each node will begin executing the First Reaction Method just as they would if they were running the
sequential case. As the node executes, it will maintain a complete history of the reactions that they have
executed and the time when those reactions occurred.
When a processor, P1, executes a reaction that effects the state of another processor, P2, P1 will add
a message to P2's message buffer notifying the processor of the change. After placing the message in
the buffer, P1 will continue executing as it would have in the sequential case.
If P2's current time is less than the message’s time, P2 will continue executing until it reaches the
message and then handle the message by executing the state-changing reaction associated with the
reaction.
If P2's current time is greater than the message’s time, P2 will rollback its executed reactions to the
point at which the message’s reaction occurred, make the change, and begin executing from the
message’s time point. As it rolls back, it must also send anti-messages to cancel the effects of
messages it had previously sent to other processors. This process is closely related to Jefferson time
warp and is described in chapter 6.4 of Reed and Fujimoto’s book.
Once all processors have reached the simulation end time, each processor will report its output for each
species it was responsible for. The output will be collected into a single file.
Code will be added to log the actions of each processor. This information will be used to “playback”
the events of the simulator, learn more about the performance of the simulator, and verify that the
simulator is working properly.
Later Implementations
Through brainstorming sessions, we have developed several ideas for how to improve the performance
of the initial implementation. Analyzing the performance of the initial implementation will help to
determine the order in which we try these techniques. The ideas are listed below.
1. Use a faster technique than the First Reaction Method to execute each step of the algorithm such as
the Next Reaction Method or the newly developed Adaptive Method.
2. Assign a few processors to strictly handle uniform and exponential random number generation.
3. Allow the software to choose the assignment of reactions to particular processors. This would
replace the configuration file.
4. Create an adaptive control system which allows reactions to be reassigned to other processors as
the simulation executes. Use this to try to minimize communication overhead.
5. Instead of storing the entire history of the simulation for each processor, use Global Virtual Time
(GVT) algorithms to do fossil collection. Use algorithms for GVT maintenance that are specifically
tuned for Shared-Memory Multi-processors.
6. Develop a real biological model for testing the simulator. Use the real model to help develop a better
artificial model generator.
7. Develop methods for building the output model file as the simulation progresses.
8. When a processor is “rolled back” because it receives a message with a time earlier than its current
time, save the random numbers that were generated and reuse them.
9. When a processor is “rolled back”, store the now invalid history so that if an anti-message arrives,
the history can now be reused.
10. Develop an MPI version that will be portable to a larger set of machines.
11. If improving the performance of exact stochastic simulation seems impossible, look into parallelizing
approximate stochastic simulation and other non-exact techniques.
Conclusion
Implementing the concepts proposed in the above document will hopefully show a significant
performance improvement over using a sequential simulation algorithm. This work will also hopefully
serve as another major step in increasing the applicability of exact stochastic simulation to biochemical
system modeling.