Abstract
We show that microSIMD architectures are more efficient for media processing than other parallel architectures like SIMD or MIMD parallel processor architectures, and VLIW or superscalar architectures. We define alternative mappings of data onto subwords, and show that the index mapping is an ideal mapping for achieving maximal subword parallelism with minimal revamping of the original serial loop code. We show an example where packed data loaded directly into registers from memory can be interpreted as index-mapped data rather than area-mapped data. This allows increased use of the subword parallelism provided by the microSIMD architecture, by exploiting data parallelism across loop iterations rather than within a loop. We also show how to convert rapidly between data mappings by using the Mix permutation instructions, first defined in the MAX-2 multimedia extensions for PA-RISC processors. We propose a new instruction, MixPair, which cuts by half the cost of parallel Mix functional units, while achieving maximum subword permutation performance.
Original language | English (US) |
---|---|
Pages (from-to) | 34-46 |
Number of pages | 13 |
Journal | Proceedings of SPIE - The International Society for Optical Engineering |
Volume | 3655 |
State | Published - Jan 1 1999 |
Event | Proceedings of the 1999 Media Processors 1999 - San Jose, CA, USA Duration: Jan 28 1999 → Jan 29 1999 |
All Science Journal Classification (ASJC) codes
- Electronic, Optical and Magnetic Materials
- Condensed Matter Physics
- Computer Science Applications
- Applied Mathematics
- Electrical and Electronic Engineering