Optimizing Multidocument Summarization by Blending Reinforcement Learning Policies

Di Jia Su, Difei Su, John M. Mulvey, H. Vincent Poor

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

We consider extractive summarization within a cluster of related texts (multidocument summarization). Unlike single-document summarization, redundancy is particularly important because sentences across related documents might convey overlapping information. Thus, sentence extraction in such a setting is difficult because one will need to determine which pieces of information are relevant while avoiding unnecessary repetitiveness. To solve this difficult problem, we propose a novel reinforcement learning-based method Policy Blending with maximal marginal relevance and Reinforcement Learning (PoBRL) for solving multidocument summarization. PoBRL jointly optimizes over the following objectives necessary for a high-quality summary: importance, relevance, and length. Our strategy decouples this multiobjective optimization into different subproblems that can be solved individually by reinforcement learning. Utilizing PoBRL, we then blend each learned policies to produce a summary that is a concise and a complete representation of the original input. Our empirical analysis shows high performance on several multidocument datasets. Human evaluation also shows that our method produces high-quality output.

Original languageEnglish (US)
Pages (from-to)416-427
Number of pages12
JournalIEEE Transactions on Artificial Intelligence
Volume4
Issue number3
DOIs
StatePublished - Jun 1 2023
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications

Keywords

  • Artificial intelligence
  • deep learning
  • deep reinforcement learning
  • document summarization
  • machine learning
  • natural language processing (NLP)
  • reinforcement learning (RL)

Fingerprint

Dive into the research topics of 'Optimizing Multidocument Summarization by Blending Reinforcement Learning Policies'. Together they form a unique fingerprint.

Cite this