Chang Johnny, Hood Robert, Jin Haoqiang
Computer architectures, benchmarking, performance evaluation
The Columbia system at the NASA Advanced Supercomputing (NAS) facility is a cluster of 20 SGI Altix nodes, each with 512 Itanium 2 processors and 1 terabyte (TB) of shared-access memory. Four of the nodes are organized as a 2048-processor capability computing platform connected by two low-latency interconnects – NUMALink4 (NL4) and InfiniBand (IB). To evaluate the scalability of Columbia with respect to both increased processor counts and increased problem sizes, we used seven of the NAS Parallel Benchmarks and all three of the NAS multi-zone benchmarks. For NPB we ran three Classes B, C, and D of benchmarks. To measure the impact of some architectural features, we compared Columbia results with results obtained on a Cray Opteron Cluster consisting of 64 nodes, each with 2 AMD Opteron processors and 2 gigabytes (GB) of memory, connected with Myrinet 2000. In these experiments, we measured performance degradation due to contention for the memory buses on the SGI Altix BX2 nodes. We also observed the effectiveness of SGI’s NL4 interconnect over Myrinet. Finally, we saw that computations spanning multiple BX2 nodes connected with NL4 performed well. Some computations did almost as well when the IB interconnects was used.