TY - GEN
T1 - Contrastive multi-document question generation
AU - Cho, Woon Sang
AU - Zhang, Yizhe
AU - Rao, Sudha
AU - Celikyilmaz, Asli
AU - Xiong, Chenyan
AU - Gao, Jianfeng
AU - Wang, Mengdi
AU - Dolan, Bill
N1 - Publisher Copyright:
© 2021 Association for Computational Linguistics
PY - 2021
Y1 - 2021
N2 - Multi-document question generation focuses on generating a question that covers the common aspect of multiple documents. Such a model is useful in generating clarifying options. However, a naive model trained only using the targeted (“positive”) document set may generate too generic questions that cover a larger scope than delineated by the document set. To address this challenge, we introduce the contrastive learning strategy where given “positive” and “negative” sets of documents, we generate a question that is closely related to the “positive” set but is far away from the “negative” set. This setting allows generated questions to be more specific and related to the target document set. To generate such specific questions, we propose Multi-Source Coordinated Question Generator (MSCQG), a novel framework that includes a supervised learning (SL) stage and a reinforcement learning (RL) stage. In the SL stage, a single-document question generator is trained. In the RL stage, a coordinator model is trained to find optimal attention weights to align multiple single-document generators, by optimizing a reward designed to promote specificity of generated questions. We also develop an effective auxiliary objective, named Set-induced Contrastive Regularization (SCR) that improves the coordinator's contrastive learning during the RL stage. We show that our model significantly outperforms several strong baselines, as measured by automatic metrics and human evaluation. The source repository is publicly available at www.github.com/woonsangcho/contrast_qgen.
AB - Multi-document question generation focuses on generating a question that covers the common aspect of multiple documents. Such a model is useful in generating clarifying options. However, a naive model trained only using the targeted (“positive”) document set may generate too generic questions that cover a larger scope than delineated by the document set. To address this challenge, we introduce the contrastive learning strategy where given “positive” and “negative” sets of documents, we generate a question that is closely related to the “positive” set but is far away from the “negative” set. This setting allows generated questions to be more specific and related to the target document set. To generate such specific questions, we propose Multi-Source Coordinated Question Generator (MSCQG), a novel framework that includes a supervised learning (SL) stage and a reinforcement learning (RL) stage. In the SL stage, a single-document question generator is trained. In the RL stage, a coordinator model is trained to find optimal attention weights to align multiple single-document generators, by optimizing a reward designed to promote specificity of generated questions. We also develop an effective auxiliary objective, named Set-induced Contrastive Regularization (SCR) that improves the coordinator's contrastive learning during the RL stage. We show that our model significantly outperforms several strong baselines, as measured by automatic metrics and human evaluation. The source repository is publicly available at www.github.com/woonsangcho/contrast_qgen.
UR - http://www.scopus.com/inward/record.url?scp=85107263068&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107263068&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85107263068
T3 - EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
SP - 12
EP - 30
BT - EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021
Y2 - 19 April 2021 through 23 April 2021
ER -