Skip to main navigation Skip to search Skip to main content

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

  • Yihua Zhang
  • , Pingzhi Li
  • , Junyuan Hong
  • , Jiaxiang Li
  • , Yimeng Zhang
  • , Wenqing Zheng
  • , Pin Yu Chen
  • , Jason D. Lee
  • , Wotao Yin
  • , Mingyi Hong
  • , Zhangyang Wang
  • , Sijia Liu
  • , Tianlong Chen

Research output: Contribution to journalConference articlepeer-review

Abstract

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard.Yet, as LLMs grow in size, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge.Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount.This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by Malladi et al.(2023).Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes.Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance.We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity.Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning.Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM.

Original languageEnglish (US)
Pages (from-to)59173-59190
Number of pages18
JournalProceedings of Machine Learning Research
Volume235
StatePublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: Jul 21 2024Jul 27 2024

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark'. Together they form a unique fingerprint.

Cite this