Multi-Dimensional Vision Transformer Compression via Dependency Guided Gaussian Process Search

Zejiang Hou, Sun Yuan Kung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Vision transformers (ViT) have recently attracted considerable attentions, but the huge computational cost remains an issue for practical deployment. Previous ViT pruning methods tend to prune the model along one dimension solely, which may suffer from excessive reduction and lead to sub-optimal model quality. In contrast, we advocate a multi-dimensional ViT compression paradigm, and propose to harness the redundancy reduction from attention head, neuron and sequence dimensions jointly. Firstly, we propose a statistical dependence based pruning criterion that is generalizable to different dimensions for identifying the deleterious components. Moreover, we cast the multidimensional ViT compression as an optimization problem, objective of which is to learn an optimal pruning policy across the three dimensions while maximizing the compressed model's accuracy under a computational budget. The problem is solved by an adapted Gaussian process search with expected improvement. Experimental results show that our method effectively reduces the computational cost of various ViT models. For example, our method reduces 40% FLOPs without top-1 accuracy loss for DeiT and T2T-ViT models on the ImageNet dataset, outperforming previous state-of-the-art ViT pruning methods.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
PublisherIEEE Computer Society
Pages3668-3677
Number of pages10
ISBN (Electronic)9781665487399
DOIs
StatePublished - 2022
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022 - New Orleans, United States
Duration: Jun 19 2022Jun 20 2022

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume2022-June
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
Country/TerritoryUnited States
CityNew Orleans
Period6/19/226/20/22

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Multi-Dimensional Vision Transformer Compression via Dependency Guided Gaussian Process Search'. Together they form a unique fingerprint.

Cite this