Abstract
Low-rank adaptation (LoRA) has become the standard approach for parameter-efficient fine-tuning of large language models (LLM), but our theoretical understanding of LoRA has been limited. In this work, we theoretically analyze LoRA fine-tuning in the neural tangent kernel (NTK) regime with N data points, showing: (i) full finetuning (without LoRA) admits a low-rank solution of rank r ≲ √N; (ii) using LoRA with rank r ≳ √N eliminates spurious local minima, allowing (stochastic) gradient descent to find the low-rank solutions; (iii) the low-rank solution found using LoRA generalizes well.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 21306-21328 |
| Number of pages | 23 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 235 |
| State | Published - 2024 |
| Externally published | Yes |
| Event | 41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria Duration: Jul 21 2024 → Jul 27 2024 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability