Abstract
Subword parallelism has succeeded in accelerating many multimedia applications. Subword permutation instructions have been proposed to efficiently rearrange subwords in or among registers. Bit-level permutation instructions have also been proposed recently for their importance in cryptography. However, some important algorithms, especially ones with lots of conditional control dependencies such as sorting, have not exploited the advantage of subword parallel instructions. In this paper, we show how one of the bit permutation instructions, GRP, can be used for fast sorting. In the process, we demonstrate the versatility of this permutation instruction for uses other than bit permutations. This versatility is important in considering the addition of a new instruction to a general-purpose processor. The results show that our sorting methods have a significant speedup even when compared with the fastest sorting algorithms. We also discuss the hardware implementation of the GRP instruction and compare its latency to a typical processor's cycle time.
Original language | English (US) |
---|---|
Pages | 234-241 |
Number of pages | 8 |
State | Published - 2002 |
Event | International Conference on Computer Design (ICCD'02) VLSI in Copmuters and Processors - Freiburg, Germany Duration: Sep 16 2002 → Sep 18 2002 |
Other
Other | International Conference on Computer Design (ICCD'02) VLSI in Copmuters and Processors |
---|---|
Country/Territory | Germany |
City | Freiburg |
Period | 9/16/02 → 9/18/02 |
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Electrical and Electronic Engineering