Parallel sorting on cache-coherent DSM multiprocessors

Hongzhang Shan, Jaswinder Pal Singh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations


The performance of parallel sorting is not well understood on hardware cache-coherent shared address space (CC-SAS) multiprocessors, which increasingly dominate the market for tightly-coupled multiprocessing. We study two high-performance parallel sorting algorithms, radix and sample sorting, under three major programming models-a load-store CC-SAS, message passing, and the segmented SHMEM model-on a 64-processor SGI Origin2000. We observe surprisingly good speedups on this demanding application. The performance of radix sort is greatly affected by the programming model and particular implementation used. Sample sort exhibits more uniform performance across programming models on this platform, but it is usually not so good as that of the best radix sort for larger data sets if each is allowed to use the best programming model for itself. The best combination of algorithm and programming model is radix sorting under the SHMEM model for larger data sets and sample sorting under CC-SAS for smaller data sets.

Original languageEnglish (US)
Title of host publicationACM/IEEE SC 1999 Conference, SC 1999
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages1
ISBN (Electronic)1581130910, 9781581130911
StatePublished - 1999
Event1999 ACM/IEEE Conference on Supercomputing, SC 1999 - Portland, United States
Duration: Nov 13 1999Nov 19 1999

Publication series

NameACM/IEEE SC 1999 Conference, SC 1999


Other1999 ACM/IEEE Conference on Supercomputing, SC 1999
Country/TerritoryUnited States

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture


Dive into the research topics of 'Parallel sorting on cache-coherent DSM multiprocessors'. Together they form a unique fingerprint.

Cite this