Single-dataset Experts for Multi-dataset Question Answering

Dan Friedman, Ben Dodge, Danqi Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Many datasets have been created for training reading comprehension models, and a natural question is whether we can combine them to build models that (1) perform better on all of the training datasets and (2) generalize and transfer better to new datasets. Prior work has addressed this goal by training one network simultaneously on multiple datasets, which works well on average but is prone to over- or under-fitting different sub-distributions and might transfer worse compared to source models with more overlap with the target dataset. Our approach is to model multi-dataset question answering with an ensemble of single-dataset experts, by training a collection of lightweight, dataset-specific adapter modules (Houlsby et al., 2019) that share an underlying Transformer model. We find that these Multi-Adapter Dataset Experts (MADE) outperform all our baselines in terms of in-distribution accuracy, and simple methods based on parameter-averaging lead to better zero-shot generalization and few-shot transfer performance, offering a strong and versatile starting point for building new reading comprehension systems.

Original languageEnglish (US)
Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages6128-6137
Number of pages10
ISBN (Electronic)9781955917094
StatePublished - 2021
Externally publishedYes
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic
Duration: Nov 7 2021Nov 11 2021

Publication series

NameEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/TerritoryDominican Republic
CityVirtual, Punta Cana
Period11/7/2111/11/21

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'Single-dataset Experts for Multi-dataset Question Answering'. Together they form a unique fingerprint.

Cite this