Implementation complexity of bit permutation instructions

Zhijie Jerry Shi, Ruby B. Lee

Research output: Contribution to journalConference articlepeer-review

16 Scopus citations


Several bit permutation instructions, including GRP, OMFLIP, CROSS, and BFLY, have been proposed recently for efficiently performing arbitrary bit permutations. Previous work has shown that these instructions can accelerate a variety of applications such as block ciphers and sorting algorithms. In this paper, we compare the implementation complexity of these instructions in terms of delay. We use logical effort, a process technology independent method, to estimate the delay of the bit permutation functional units. Our results show that for 64-bit operations, the BFLY instruction is the fastest among these bit permutation instructions; the OMFLIP instruction is next; and the GRP instruction is the slowest.

Original languageEnglish (US)
Pages (from-to)879-886
Number of pages8
JournalConference Record of the Asilomar Conference on Signals, Systems and Computers
StatePublished - 2003
EventConference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers - Pacific Grove, CA, United States
Duration: Nov 9 2003Nov 12 2003

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Computer Networks and Communications


Dive into the research topics of 'Implementation complexity of bit permutation instructions'. Together they form a unique fingerprint.

Cite this