Many researchers have proposed interesting protocols for shared virtual memory (SVM) systems, and demonstrated performance improvements on parallel programs. However, there is still no clear understanding of the performance potential of SVM systems for different classes of applications. This paper begins to fill this gap, by studying the performance of a range of applications in detail and understanding it in light of application characteristics. We first develop a brief classification of the inherent data sharing patterns in the applications, and how they interact with system granularities to yield the communication patterns relevant to SVM systems. We then use detailed simulation to compare the performance of two SVM approaches - Lazy Released Consistency (LRC) and Automatic Update Release Consistency (AURC) - with each other and with an all-hardware CC-NUMA approach. We examine how performance is affected by problem size, machine size, key system parameters, and the use of less optimized program implementations. We find that SVM can indeed perform quite well for systems of at least up to 32 processors for several nontrivial applications. However, performance is much more variable across applications than on CC-NUMA systems, and the problem sizes needed to obtain good parallel performance are substantially larger. The hardware-assisted AURC system tends to perform significantly better than the all-software LRC under our system assumptions, particularly when realistic cache hierarchies are used.
|Number of pages
|Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA
|Published - 1996
|Proceedings of the 1996 23rd Annual International Symposium on Computer Architecture - Philadelphia, PA, USA
Duration: May 22 1996 → May 24 1996
All Science Journal Classification (ASJC) codes
- Hardware and Architecture