Optimizing N-dimensional, winograd-based convolution for manycore CPUs

Zhen Jia, Aleksandar Zlateski, Fredo Durand, Kai Li

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Scopus citations

Abstract

Recent work on Winograd-based convolution allows for a great reduction of computational complexity, but existing implementations are limited to 2D data and a single kernel size of 3 by 3. They can achieve only slightly better, and often worse performance than better optimized, direct convolution implementations. We propose and implement an algorithm for N-dimensional Winograd-based convolution that allows arbitrary kernel sizes and is optimized for manycore CPUs. Our algorithm achieves high hardware utilization through a series of optimizations. Our experiments show that on modern ConvNets, our optimized implementation, is on average more than 3 x, and sometimes 8 x faster than other state-of-the-art CPU implementations on an Intel Xeon Phi manycore processors. Moreover, our implementation on the Xeon Phi achieves competitive performance for 2D ConvNets and superior performance for 3D ConvNets, compared with the best GPU implementations.

Original languageEnglish (US)
Title of host publicationACM SIGPLAN Notices
PublisherAssociation for Computing Machinery
Pages109-123
Number of pages15
Volume53
Edition1
ISBN (Electronic)9781450349116
DOIs
StatePublished - Feb 10 2018

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Keywords

  • convolution
  • parallelization
  • vectorization
  • winograd

Fingerprint Dive into the research topics of 'Optimizing N-dimensional, winograd-based convolution for manycore CPUs'. Together they form a unique fingerprint.

  • Cite this

    Jia, Z., Zlateski, A., Durand, F., & Li, K. (2018). Optimizing N-dimensional, winograd-based convolution for manycore CPUs. In ACM SIGPLAN Notices (1 ed., Vol. 53, pp. 109-123). Association for Computing Machinery. https://doi.org/10.1145/3178487.3178496