Code tuning, process-placement, OpenMP scaling, memory contention, unaligned memory access
This paper describes four case studies of application performance enhancements on the Columbia supercomputer. The Columbia supercomputer is a cluster of twenty SGI Altix systems, each with 512 Itanium 2 processors and 1 terabyte of global shared memory,and is located at the NASA Advanced Supercomputing (NAS) facility in Moffett Field. The code optimization techniques described in the case studies include both implicit and explicit process-placement to pin processes on CPUs closest to the processes’ memory, removing memory contention in OpenMP applications, eliminating unaligned memory accesses, and system profiling. These techniques enabled approximately 2- to 20-fold improvements in application performance.