EdgeTran: Device-Aware Co-Search of Transformers for Efficient Inference on Mobile Edge Platforms

Shikhar Tuli, Niraj K. Jha

Research output: Contribution to journalArticlepeer-review

Abstract

Automated design of efficient transformer models has recently attracted significant attention from industry and academia. However, most works only focus on certain metrics while searching for the best-performing transformer architecture. Furthermore, running traditional, complex, and large transformer models on low-compute edge platforms is a challenging problem. In this work, we propose a framework, called ProTran, to profile the hardware performance measures for a design space of transformer architectures and a diverse set of edge devices. We use this profiler in conjunction with the proposed co-search technique to obtain the best-performing models that have high accuracy on the given task and minimize latency, energy consumption, and peak power draw to enable edge deployment. We refer to our framework for co-optimizing accuracy and hardware performance measures as EdgeTran. It searches for the best transformer model and edge device pair. Finally, we propose GPTran, a multi-stage block-level grow-and-prune post-processing step that further improves accuracy in a hardware-aware manner. The obtained transformer model is 2.8× smaller and has a 0.8% higher GLUE score than the baseline (BERT-Base). Inference with it on the selected edge device enables 15.0% lower latency, 10.0× lower energy, and 10.8× lower peak power draw compared to an off-the-shelf GPU.

Original languageEnglish (US)
Pages (from-to)7012-7029
Number of pages18
JournalIEEE Transactions on Mobile Computing
Volume23
Issue number6
DOIs
StatePublished - Jun 1 2024

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Keywords

  • Embedded platforms
  • hardware-software co-design
  • machine learning
  • transformer design space

Fingerprint

Dive into the research topics of 'EdgeTran: Device-Aware Co-Search of Transformers for Efficient Inference on Mobile Edge Platforms'. Together they form a unique fingerprint.

Cite this