Symmetric-key block ciphers encrypt data, providing data confidentiality over the public Internet. For interoperability reasons, it is desirable to support a variety of symmetric-key ciphers efficiently. We show the basic operations performed by a variety of symmetric-key cryptography algorithms. Of these basic operations, only bit permutation is very slow using existing processors, followed by integer multiplication. New instructions have been proposed recently to accelerate bit permutations in general-purpose processors, reducing the instructions needed to achieve an arbitrary n-bit permutation from O(n) to O(log(n)). However, the serial data-dependency between these log(n) permutation instructions prevents them from being executed in fewer than log(n) cycles, even on superscalar processors. Since application specific instruction processors (ASIPs) have fewer constraints on maintaining standard processor datapath and control conventions, can we achieve even faster permutations? We propose six alternative ASIP approaches to achieve arbitrary 64 bit permutations in one or two cycles, using new BFLY and IBFLY instructions. This reduction to one or two cycles is achieved without increasing the cycle time. We compare the latencies of different permutation units in a technology independent way to estimate cycle time impact. We also compare the alternative ASIP architectures and their efficiency in performing arbitrary 64 bit permutations.