|
|
|
|
The Cheetah system at Oak Ridge National Laboratory underwent an operating system upgrade in April, 2002, moving from AIX 5.1c to 5.1d. The new version includes improved memory affinity and support for large pages (16MB instead of 4KB), both of which should improve performance for codes with significant memory traffic. The L3 configuration settings were also modified during this time period, resulting in a performance improvement for some codes. The initial installation of 5.1d contained a bug in the process migration logic that severely degraded performance when using all of the processors on a node. The Cheetah system has two different patches to this problem, lb2 and lb4. The first, lb2, is installed on all but one node, while the second is installed on the node where large pages are enabled. The lb4 patch is unrelated to enabling large pages, but the two together are the configuration that we are evaluating in these experiments.The patches lb2 and lb4 are not significantly different. However, explicitly binding processes to processors shows a mild performance improvement over not doing so with lb2. The lb4 patch is an adjustment in the process migration logic that attempts to improve performance to that when binding processes. From our limited experience, not binding processes with lb4 achieves better performance than when binding processes, and lb4 is expected to be the basis for an "official" IBM patch.
Enabling large pages requires that some percentage of system memory be explicitly allocated to large pages at boot time. An environment variable can then be set indicating that an application code should use large pages. (This can also be specified at compile time.) For these experiments, 50% of the memory on a node with 32GB of memory were allocated to large pages (1024 16MB pages). The remaining pages were 4KB.
In the experiments that follow, we compare performance between two or more of the following environments.
Our initial studies examined performance using the PSTSWM serial benchmark:
- 5.1C, original L3 cache configuration
- 5.1D, modified L3 cache configuration, lb2 patch
- 5.1D, modified L3 cache configuration, lb2 patch, process binding
- 5.1D, modified L3 cache configuration, lb4 patch,
- 5.1D, modified L3 cache configuration, lb4 patch, process binding
- 5.1D, modified L3 cache configuration, lb4 patch, large pages