TY - GEN
T1 - Eliminating memory bottlenecks for a JPEG encoder through distributed logic-memory architecture and computation-unit integrated memory
AU - Huang, Chao
AU - Ravi, Srivaths
AU - Raghunathan, Anand
AU - Jha, Niraj K.
N1 - Funding Information:
This work was supported by the Swedish Research Council (grant no. 2014-2271), the Swedish Cancer Society (grant no. CAN 2016/684), and FORTE (grant no. 2016-00081). We would like to also acknowledge the Swedish Initiative for research on Microdata in the Social and Medical Sciences (SIMSAM), grant no. 80748301. Haomin Yang is supported by a grant from the China Scholarship Council (grant no. 201406010275). Jingmei Li is a recipient of awards from the Åke Wiberg Foundation and the Ollie och Elof Ericssons Foundation for Scientific Research.
PY - 2005
Y1 - 2005
N2 - Several application domains, including multimedia and network processing, are highly memory intensive, making memory a bottleneck to designing higher performance and lower power application-specific integrated circuits (ASICs). Design methodologies based on innovative architectures, namely distributed logic-memory architectures and computation-unit integrated memories, have been shown to improve circuit performance significantly. In this paper, these design methodologies are discussed and evaluated through the implementation of an ASIC for the JPEG still image compression standard. The implemented system is capable of stand-alone image compression, and has been synthesized using the TSMC 0.13μm 1.20V eight-layer metal CMOS process. A four-way distributed implementation can achieve an execution time of 2.23ms (a speed-up of 2.87X) for a 128 × 128 input image at the cost of chip area overhead of 51.4% while the energy-delay product is reduced by 2.35X. Design metrics of various other implementations are also compared.
AB - Several application domains, including multimedia and network processing, are highly memory intensive, making memory a bottleneck to designing higher performance and lower power application-specific integrated circuits (ASICs). Design methodologies based on innovative architectures, namely distributed logic-memory architectures and computation-unit integrated memories, have been shown to improve circuit performance significantly. In this paper, these design methodologies are discussed and evaluated through the implementation of an ASIC for the JPEG still image compression standard. The implemented system is capable of stand-alone image compression, and has been synthesized using the TSMC 0.13μm 1.20V eight-layer metal CMOS process. A four-way distributed implementation can achieve an execution time of 2.23ms (a speed-up of 2.87X) for a 128 × 128 input image at the cost of chip area overhead of 51.4% while the energy-delay product is reduced by 2.35X. Design metrics of various other implementations are also compared.
UR - http://www.scopus.com/inward/record.url?scp=33847095113&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33847095113&partnerID=8YFLogxK
U2 - 10.1109/CICC.2005.1568651
DO - 10.1109/CICC.2005.1568651
M3 - Conference contribution
AN - SCOPUS:33847095113
SN - 0780390237
SN - 9780780390232
T3 - Proceedings of the Custom Integrated Circuits Conference
SP - 239
EP - 242
BT - Proceedings of the IEEE 2005 Custom Integrated Circuits Conference
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE 2005 Custom Integrated Circuits Conference
Y2 - 18 September 2005 through 21 September 2005
ER -