Several application domains, including multimedia and network processing, are highly memory intensive, making memory a bottleneck to designing higher performance and lower power application-specific integrated circuits (ASICs). Design methodologies based on innovative architectures, namely distributed logic-memory architectures and computation-unit integrated memories, have been shown to improve circuit performance significantly. In this paper, these design methodologies are discussed and evaluated through the implementation of an ASIC for the JPEG still image compression standard. The implemented system is capable of stand-alone image compression, and has been synthesized using the TSMC 0.13μm 1.20V eight-layer metal CMOS process. A four-way distributed implementation can achieve an execution time of 2.23ms (a speed-up of 2.87X) for a 128 × 128 input image at the cost of chip area overhead of 51.4% while the energy-delay product is reduced by 2.35X. Design metrics of various other implementations are also compared.