TY - GEN
T1 - Document similarity for texts of varying lengths via hidden topics
AU - Gong, Hongyu
AU - Sakakini, Tarek
AU - Bhat, Suma
AU - Xiong, Jinjun
N1 - Publisher Copyright:
© 2018 Association for Computational Linguistics
PY - 2018
Y1 - 2018
N2 - Measuring similarity between texts is an important task for several applications. Available approaches to measure document similarity are inadequate for document pairs that have non-comparable lengths, such as a long document and its summary. This is because of the lexical, contextual and the abstraction gaps between a long document of rich details and its concise summary of abstract information. In this paper, we present a document matching approach to bridge this gap, by comparing the texts in a common space of hidden topics. We evaluate the matching algorithm on two matching tasks and find that it consistently and widely outperforms strong baselines. We also highlight the benefits of the incorporation of domain knowledge to text matching.
AB - Measuring similarity between texts is an important task for several applications. Available approaches to measure document similarity are inadequate for document pairs that have non-comparable lengths, such as a long document and its summary. This is because of the lexical, contextual and the abstraction gaps between a long document of rich details and its concise summary of abstract information. In this paper, we present a document matching approach to bridge this gap, by comparing the texts in a common space of hidden topics. We evaluate the matching algorithm on two matching tasks and find that it consistently and widely outperforms strong baselines. We also highlight the benefits of the incorporation of domain knowledge to text matching.
UR - http://www.scopus.com/inward/record.url?scp=85063103097&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063103097&partnerID=8YFLogxK
U2 - 10.18653/v1/p18-1218
DO - 10.18653/v1/p18-1218
M3 - Conference contribution
AN - SCOPUS:85063103097
T3 - ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
SP - 2341
EP - 2351
BT - ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PB - Association for Computational Linguistics (ACL)
T2 - 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018
Y2 - 15 July 2018 through 20 July 2018
ER -