A survey of available corpora for building data-driven dialogue systems: The journal version

Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, Joelle Pineau

Research output: Contribution to journalArticlepeer-review

103 Scopus citations

Abstract

During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn various components of a dialogue system, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choices of evaluation metrics for the learning objective.

Original languageEnglish (US)
Pages (from-to)1-49
Number of pages49
JournalDialogue and Discourse
Volume9
Issue number1
DOIs
StatePublished - 2018
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Communication
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'A survey of available corpora for building data-driven dialogue systems: The journal version'. Together they form a unique fingerprint.

Cite this