Multimedia processing in software has been significantly accelerated by the addition of subword-parallel instructions to the instruction set architectures (ISAs) of modem microprocessors. While some of these multimedia instructions are simple and effective, others are very complex, requiring large, special-purpose functional units that are not practical for constrained environments such as handheld multimedia information appliances. For such environments, low-power and low-cost are as important as the high performance required for real-time multimedia processing and the general-purpose programmability required to support an ever growing range of applications. In this paper, we introduce PLX, a concise ISA that selects the most useful features from the first two generations of multimedia instructions added to microprocessors, and explores new ISA features for high-performance yet low-cost multimedia processing with small footprint processors. PLX is unique in that it is designed from scratch as a fully subword-parallel architecture with novel features like datapath scalability from 32-bit to 128-bit words, and a new definition of predication for reducing conditional branches. We illustrate the use of PLX's architectural features with four frequently used multimedia kernels: discrete cosine transform, pixel padding, clip test and median filter. Our performance results show that a 64-bit PLX implementation achieves significant speedups compared to a basic 64-bit RISC processor and to IA-32 processors with MMX and SSE multimedia extensions. PLX's datapath scalability feature often provides an additional 2x speedup in a cost-effective way.