Language Models as Science Tutors

  • Alexis Chevalier
  • , Jiayi Geng
  • , Alexander Wettig
  • , Howard Chen
  • , Sebastian Mizera
  • , Toni Annala
  • , Max Jameson Aragon
  • , Arturo Rodríguez Fanlo
  • , Simon Frieder
  • , Simon Machado
  • , Akshara Prabhakar
  • , Ellie Thieu
  • , Jiachen T. Wang
  • , Zirui Wang
  • , Xindi Wu
  • , Mengzhou Xia
  • , Wenhan Xia
  • , Jiatong Yu
  • , Jun Jie Zhu
  • , Zhiyong Jason Ren
  • Sanjeev Arora, Danqi Chen

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TUTOREVAL and TUTORCHAT. TUTOREVAL is a diverse question-answering benchmark consisting of questions about long chapters from STEM textbooks, written by experts. TUTOREVAL helps measure real-life usability of LMs as scientific assistants, and it is the first benchmark combining long contexts, free-form generation, and multidisciplinary scientific knowledge. Moreover, we show that fine-tuning base models with existing dialogue datasets leads to poor performance on TUTOREVAL. Therefore, we create TUTORCHAT, a dataset of 80,000 long synthetic dialogues about textbooks. We use TUTORCHAT to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TUTOREVAL while performing strongly on GSM8K and MATH. Our datasets build on open-source materials, and we release our models, data, and evaluations publicly.

Original languageEnglish (US)
Pages (from-to)8310-8335
Number of pages26
JournalProceedings of Machine Learning Research
Volume235
StatePublished - 2024
Event41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria
Duration: Jul 21 2024Jul 27 2024

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Language Models as Science Tutors'. Together they form a unique fingerprint.

Cite this