Major Difficulties (cont’d)
Runtime environment (2 PY)
- Most problems related to message passing
- Runtime utilities must recover from network errors
- Linux copy-on-write caused “lost” messages
- Problems show up as
- Failure to start job
- Utilities become uncommunicative – compute nodes become stale, allocator is unresponsive
Interaction of Linux, Portals, and the utilities (60% rewrite, 30% debugging, 10% enhancement)