Synchronization is an area that exhibits rich hardware-software interactions in multiprocessors. It was studied extensively using microbenchmarks a decade ago. However, its performance implications are not well understood on modern systems or on real applications. We study the impact of synchronization primitives and algorithms on a modern, 64-processor, hardware-coherent shared address space multiprocessor: the SGI Origin 2000. In addition to the actual results on a modern system, we examine the key methodological issues in studying synchronization, for both microbenchmarks and applications. We find that although the efficient hardware support (Fetch&Op) for synchronization provided on our machine usually helps lock and barrier microbenchmarks, it does not help in improving application performance when compared to good software algorithms that use the processor-provided LL-SC instructions. This is true even in applications that spend a significant amount of time in synchronization operations. More elaborate hardware support is unlikely to have a significant benefit either. From the applications' perspective, it is usually the waiting time due to load imbalance or serialization that dominates synchronization time, not the overhead of the synchronization operations themselves, even in apparently balanced cases where the overhead may be expected to be substantial.
|Number of pages
|Published - Jun 1999
|Proceedings of the 1999 International Conference on Measurement and Modeling of Computer Systems, ACM SIGMETRICS '99 - Atlata, GA, USA
Duration: May 1 1999 → May 4 1999
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Computer Networks and Communications