Abstract
Several bit permutation instructions, including GRP, OMFLIP, CROSS, and BFLY, have been proposed recently for efficiently performing arbitrary bit permutations. Previous work has shown that these instructions can accelerate a variety of applications such as block ciphers and sorting algorithms. In this paper, we compare the implementation complexity of these instructions in terms of delay. We use logical effort, a process technology independent method, to estimate the delay of the bit permutation functional units. Our results show that for 64-bit operations, the BFLY instruction is the fastest among these bit permutation instructions; the OMFLIP instruction is next; and the GRP instruction is the slowest.
Original language | English (US) |
---|---|
Pages (from-to) | 879-886 |
Number of pages | 8 |
Journal | Conference Record of the Asilomar Conference on Signals, Systems and Computers |
Volume | 1 |
State | Published - 2003 |
Event | Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers - Pacific Grove, CA, United States Duration: Nov 9 2003 → Nov 12 2003 |
All Science Journal Classification (ASJC) codes
- Signal Processing
- Computer Networks and Communications