Skip to main navigation Skip to search Skip to main content

Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space

  • Arnav Chavan
  • , Zhiqiang Shen
  • , Zhuang Liu
  • , Zechun Liu
  • , Kwang Ting Cheng
  • , Eric Xing

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework. It can search a sub-structure from the original model end-to-end across multiple dimensions, including the input tokens, MHSA and MLP modules with state-of-the-art performance. Our method is based on a learnable and unified ℓ1sparsity constraint with pre-defined factors to reflect the global importance in the continuous searching space of different dimensions. The searching process is highly efficient through a single-shot training scheme. For instance, on DeiT-S, ViT-Slim only takes 43 GPU hours for the searching process, and the searched structure is flexible with diverse dimensionalities in different modules. Then, a budget threshold is employed according to the requirements of accuracy-FLOPs trade-off on running devices, and a retraining process is performed to obtain the final model. The extensive experiments show that our ViT-Slim can compress up to 40% of parameters and 40% FLOPs on various vision transformers while increasing the accuracy by 0.6% on ImageNet. We also demonstrate the advantage of our searched models on several downstream datasets. Our code is available at https://github.com/Arnav0400/ViT-Slim.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PublisherIEEE Computer Society
Pages4921-4931
Number of pages11
ISBN (Electronic)9781665469463
DOIs
StatePublished - 2022
Externally publishedYes
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States
Duration: Jun 19 2022Jun 24 2022

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/TerritoryUnited States
CityNew Orleans
Period6/19/226/24/22

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition

Keywords

  • Datasets and evaluation
  • Efficient learning and inferences
  • Recognition: detection
  • Representation learning
  • Vision applications and systems
  • categorization
  • retrieval

Fingerprint

Dive into the research topics of 'Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space'. Together they form a unique fingerprint.

Cite this