B. I/O Benchmarks Each of the I/O benchmarks will be evaluated in the following manner. The method of running the benchmark and the input parameters are documented in the code. For clarity assume that the delivered system has S IO servers A-F. Each is independent for most practical considerations. A test system with one I/O server and a proportional number of nodes, R, should be sufficient to simulate the load on the delivered system. On the delivered system, the following will hold: - Each Compute node is pinned to a particular /stage server. - RES, as we configure it, assigns exactly 1 job (process) to each core in the System. There is no over subscription of processors. - For a single test, an I/O Benchmark will be accessing the same working directory on a particular I/O server. - The timed portion of each test will have a start synchronized to less than a second. - The Government requires a properly functioning NTP on all nodes in the System. Between individual benchmark runs the files should be removed and recreated to avoid unintended caching effects -- this applies to 1a, 1b, and 2. The input file for 3 is sufficiently large to exceed any reasonable cache; also recreating it each time would be very time-consuming. B.1 I/O Benchmark 1a (read_from_cnode_cache) A 150 MB file is replicated on each of the S I/O servers. Time the operation of each CPU reading the same 150 MB replica 64 times. The replica must be deleted and recreated for successive runs of the benchmark. B.2 I/O Benchmark 1b (mmap_file_in) A 750 MB file is replicated on each of the S I/O servers. Time the operation of each CPU read-only mmap()ing the same 750 MB replica 16 times. For each iteration the processes will touch each page sequentially, then munmap() the segment. The replica must be deleted and recreated for successive runs of the benchmark. B.3 I/O Benchmark 2 (read_from_ioserver_cache) A 4 GB file is replicated on each of the S I/O servers. Time the operation of each CPU reading N random (aligned) chunks of size M for these parameters: 2a) N = 18000, M = 4 KB 2b) N = 150, M = 1 MB 2c) N = 1, M = 128 MB The replica must be deleted and recreated for successive runs of the benchmark. The expectation is that the majority of reads will be cached on the I/O servers. The timed result t for Benchmark 2 is the sum of the individual times for 2a, 2b, and 2c. B.4 I/O Benchmark 3 (read_from_disk) A 512 GB file is replicated on each of the S I/O servers. Time the operation of each CPU reading N random (aligned) chunks of size M for these parameters: 3a) N = 80, M = 1 MB 3b) N = 1, M = 96 MB The expectation is that most reads will be uncached on the I/O servers. The timed result t for Benchmark 3 is the sum of the individual times for 3a and 3b. The replica need not be recreated for successive runs of the benchmark. B.5 I/O Benchmark 4 (write_many_files) Each CPU writes N independent files (with unique filenames) of size M for these parameters: 4a) N = 1200, M = 4 KB 4b) N = 60, M = 1 MB 4c) N = 1, M = 64 MB If there are R compute nodes per I/O server, each benchmark part will create (N*R) files in the working directory. The timed result t for Benchmark 4 is the sum of the individual times for 4a, 4b, and 4c. The makefile that builds the I/O benchmarks also has a rudimentary rule to run the program on a single cpu/node. This can be used initially to get the programs running. The default is to have the program spin wait 5 seconds before the timing starts. There is a Perl script named sim_res_run.pl that does a similar process, and also shows how to dynamically set a start time. For a valid test the vendor must start a benchmark on each node. Since every vendor may have a different method for launching parallel jobs, this is left to the tester's discretion. However, for a valid timing run all the programs must run simultaneously. The start time argument is available for synchronizing the programs. A vendor may use RES to run the tests.