TY - GEN
T1 - IDEAL
T2 - 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017
AU - Mahmoud, Mostafa
AU - Zheng, Bojian
AU - Lascorz, Alberto Delmás
AU - Heide, Felix
AU - Assouline, Jonathan
AU - Boucher, Paul
AU - Onzon, Emmanuel
AU - Moshovos, Andreas
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/10/14
Y1 - 2017/10/14
N2 - Computational imaging pipelines (CIPs) convert the raw output of imaging sensors into the high-quality images that are used for further processing. This work studies how Block-Matching and 3D filtering (BM3D), a state-of-the-art denoising algorithm can be implemented to meet the demands of user-interactive (UI) applications. Denoising is the most computationally demanding stage of a CIP taking more than 95% of time on a highly-optimized software implementation [29].We analyze the performance and energy consumption of optimized software implementations on three commodity platforms and find that their performance is inadequate. Accordingly, we consider two alternatives: a dedicated accelerator, and running recently proposed Neural Network (NN) based approximations of BM3D [9, 27] on an NN accelerator. We develop Image DEnoising AcceLerator(IDEAL), a hardware BM3D accelerator which incorporates the following techniques: 1) a novel software-hardware optimization, Matches Reuse (MR), that exploits typical image content to reduce the computations needed by BM3D, 2) prefetching and judicious use of on-chip buffering to minimize execution stalls and off-chip bandwidth consumption, 3) a careful arrangement of specialized computing blocks, and 4) data type precision tuning. Over a dataset of images with resolutions ranging from 8 megapixel (MP) and up to 42MP, IDEAL is 11, 352× and 591× faster than high-end general-purpose (CPU) and graphics processor (GPU) software implementations with orders of magnitude better energy eficiency. Even when the NN approximations of BM3D are run on the DaDianNao [14] high-end hardware NN accelerator, IDEAL is 5.4× faster and 3.95× more energy efficient.
AB - Computational imaging pipelines (CIPs) convert the raw output of imaging sensors into the high-quality images that are used for further processing. This work studies how Block-Matching and 3D filtering (BM3D), a state-of-the-art denoising algorithm can be implemented to meet the demands of user-interactive (UI) applications. Denoising is the most computationally demanding stage of a CIP taking more than 95% of time on a highly-optimized software implementation [29].We analyze the performance and energy consumption of optimized software implementations on three commodity platforms and find that their performance is inadequate. Accordingly, we consider two alternatives: a dedicated accelerator, and running recently proposed Neural Network (NN) based approximations of BM3D [9, 27] on an NN accelerator. We develop Image DEnoising AcceLerator(IDEAL), a hardware BM3D accelerator which incorporates the following techniques: 1) a novel software-hardware optimization, Matches Reuse (MR), that exploits typical image content to reduce the computations needed by BM3D, 2) prefetching and judicious use of on-chip buffering to minimize execution stalls and off-chip bandwidth consumption, 3) a careful arrangement of specialized computing blocks, and 4) data type precision tuning. Over a dataset of images with resolutions ranging from 8 megapixel (MP) and up to 42MP, IDEAL is 11, 352× and 591× faster than high-end general-purpose (CPU) and graphics processor (GPU) software implementations with orders of magnitude better energy eficiency. Even when the NN approximations of BM3D are run on the DaDianNao [14] high-end hardware NN accelerator, IDEAL is 5.4× faster and 3.95× more energy efficient.
KW - Accelerator
KW - Computational imaging
KW - Image denoising
KW - Neural networks
UR - http://www.scopus.com/inward/record.url?scp=85034035105&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85034035105&partnerID=8YFLogxK
U2 - 10.1145/3123939.3123941
DO - 10.1145/3123939.3123941
M3 - Conference contribution
AN - SCOPUS:85034035105
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 82
EP - 95
BT - MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings
PB - IEEE Computer Society
Y2 - 14 October 2017 through 18 October 2017
ER -