TY - JOUR
T1 - A survey of available corpora for building data-driven dialogue systems
T2 - The journal version
AU - Serban, Iulian Vlad
AU - Lowe, Ryan
AU - Henderson, Peter
AU - Charlin, Laurent
AU - Pineau, Joelle
N1 - Publisher Copyright:
© 2018 Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, Joelle Pineau.
PY - 2018
Y1 - 2018
N2 - During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn various components of a dialogue system, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choices of evaluation metrics for the learning objective.
AB - During the past decade, several areas of speech and language understanding have witnessed substantial breakthroughs from the use of data-driven models. In the area of dialogue systems, the trend is less obvious, and most practical systems are still built through significant engineering and expert knowledge. Nevertheless, several recent results suggest that data-driven approaches are feasible and quite promising. To facilitate research in this area, we have carried out a wide survey of publicly available datasets suitable for data-driven learning of dialogue systems. We discuss important characteristics of these datasets, how they can be used to learn various components of a dialogue system, and their other potential uses. We also examine methods for transfer learning between datasets and the use of external knowledge. Finally, we discuss appropriate choices of evaluation metrics for the learning objective.
UR - http://www.scopus.com/inward/record.url?scp=85049558859&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049558859&partnerID=8YFLogxK
U2 - 10.5087/dad.2018.101
DO - 10.5087/dad.2018.101
M3 - Article
AN - SCOPUS:85049558859
SN - 2152-9620
VL - 9
SP - 1
EP - 49
JO - Dialogue and Discourse
JF - Dialogue and Discourse
IS - 1
ER -