Abstract
We present LLEMMA, a large language model for mathematics. We continue pretraining Code Llama on Proof-Pile-2, a mixture of scientific papers, web data containing mathematics, and mathematical code, yielding LLEMMA. On the MATH benchmark LLEMMA outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis. Moreover, LLEMMA is capable of tool use and formal theorem proving without any further finetuning. We openly release all artifacts, including 7 billion and 34 billion parameter models, the Proof-Pile-2, and code to replicate our experiments.
Original language | English (US) |
---|---|
State | Published - 2024 |
Event | 12th International Conference on Learning Representations, ICLR 2024 - Hybrid, Vienna, Austria Duration: May 7 2024 → May 11 2024 |
Conference
Conference | 12th International Conference on Learning Representations, ICLR 2024 |
---|---|
Country/Territory | Austria |
City | Hybrid, Vienna |
Period | 5/7/24 → 5/11/24 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Computer Science Applications
- Education
- Linguistics and Language