As hardware-coherent, distributed shared memory (DSM) multiprocessing becomes popular commercially, it is impor-tant to evaluate modern realizations to understand how they perform and scale for a range of interesting applications and to identify the nature of the key bottlenecks. This paper evaluates the SGI Origin2000-the machine that perhaps has the most aggressive communication architecture of the recent cache-coherent offerings - and, in doing so, articulates a sound methodology for evaluating real systems. We examine data access and synchronization microbenchmarks; speedups for different application classes, problem sizes and scaling models; detailed interactions and time breakdowns using performance tools; and the impact of special hardware support. We find that overall the Origin appears to deliver on the promise of cache-coherent shared address space multiprocessing, at least at the 32-processor scale we examine. The machine is quite easy to program for performance and has fewer organizational problems than previous systems we have examined. However, some important trouble spots are also identified, especially related to contention that is apparently caused by engineering decisions to share resources among processors.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Computer Networks and Communications