TY - GEN
T1 - MoQA
T2 - 3rd Workshop on Document-grounded Dialogue and Conversational Question Answering, DialDoc 2023, co-located with ACL 2023
AU - Yen, Howard
AU - Gao, Tianyu
AU - Lee, Jinhyuk
AU - Chen, Danqi
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Previous research on open-domain question answering (QA) focuses mainly on short-answered questions. However, information-seeking QA often requires various formats of answers depending on the nature of the questions, e.g., why/how questions typically require a long answer. In this paper, we present MOQA1, a benchmark for open-domain QA that requires building one system that can provide short, medium, long, and yes/no answers to different questions accordingly. MOQA builds upon Natural Questions (Kwiatkowski et al., 2019) with multiple types of questions and additional crowd-sourcing efforts to ensure high data quality. We adapt state-of-the-art models, and reveal unique findings in multi-type open-domain QA: (1) For retriever-reader models, training one retriever on all types achieves the overall best performance, but it is challenging to train one reader model to output answers of different formats, or to train a question classifier to distinguish between types; (2) An end-to-end closed-book QA model trained on multiple types struggles with the task across the board; (3) State-of-the-art large language models such as the largest GPT-3 models (Brown et al., 2020; Ouyang et al., 2022) also lag behind open-book QA models. Our benchmark and analysis call for more effort to build versatile open-domain QA models in the future.
AB - Previous research on open-domain question answering (QA) focuses mainly on short-answered questions. However, information-seeking QA often requires various formats of answers depending on the nature of the questions, e.g., why/how questions typically require a long answer. In this paper, we present MOQA1, a benchmark for open-domain QA that requires building one system that can provide short, medium, long, and yes/no answers to different questions accordingly. MOQA builds upon Natural Questions (Kwiatkowski et al., 2019) with multiple types of questions and additional crowd-sourcing efforts to ensure high data quality. We adapt state-of-the-art models, and reveal unique findings in multi-type open-domain QA: (1) For retriever-reader models, training one retriever on all types achieves the overall best performance, but it is challenging to train one reader model to output answers of different formats, or to train a question classifier to distinguish between types; (2) An end-to-end closed-book QA model trained on multiple types struggles with the task across the board; (3) State-of-the-art large language models such as the largest GPT-3 models (Brown et al., 2020; Ouyang et al., 2022) also lag behind open-book QA models. Our benchmark and analysis call for more effort to build versatile open-domain QA models in the future.
UR - http://www.scopus.com/inward/record.url?scp=85174833856&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174833856&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85174833856
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 8
EP - 29
BT - DialDoc 2023 - Proceedings of the 3rd DialDoc Workshop on Document-Grounded Dialogue and Conversational Question Answering, Proceedings of the Workshop
A2 - Muresan, Smaranda
A2 - Chen, Vivian
A2 - Kennington, Casey
A2 - Vandyke, David
A2 - Dethlefs, Nina
A2 - Inoue, Koji
A2 - Ekstedt, Erik
A2 - Ultes, Stefan
PB - Association for Computational Linguistics (ACL)
Y2 - 13 July 2023
ER -