TY - GEN
T1 - FUSE
T2 - European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2020
AU - Zhu, Wanzheng
AU - Gong, Hongyu
AU - Shen, Jiaming
AU - Zhang, Chao
AU - Shang, Jingbo
AU - Bhat, Suma
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Set expansion aims to expand a small set of seed entities into a complete set of relevant entities. Most existing approaches assume the input seed set is unambiguous and completely ignore the multi-faceted semantics of seed entities. As a result, given the seed set {“Canon”, “Sony”, “Nikon”}, previous models return one mixed set of entities that are either Camera Brands or Japanese Companies. In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet. We propose an unsupervised framework, FUSE, which consists of three major components: (1) facet discovery module: identifies all semantic facets of each seed entity by extracting and clustering its skip-grams, and (2) facet fusion module: discovers shared semantic facets of the entire seed set by an optimization formulation, and (3) entity expansion module: expands each semantic facet by utilizing a masked language model with pre-trained BERT models. Extensive experiments demonstrate that FUSE can accurately identify multiple semantic facets of the seed set and generate quality entities for each facet.
AB - Set expansion aims to expand a small set of seed entities into a complete set of relevant entities. Most existing approaches assume the input seed set is unambiguous and completely ignore the multi-faceted semantics of seed entities. As a result, given the seed set {“Canon”, “Sony”, “Nikon”}, previous models return one mixed set of entities that are either Camera Brands or Japanese Companies. In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet. We propose an unsupervised framework, FUSE, which consists of three major components: (1) facet discovery module: identifies all semantic facets of each seed entity by extracting and clustering its skip-grams, and (2) facet fusion module: discovers shared semantic facets of the entire seed set by an optimization formulation, and (3) entity expansion module: expands each semantic facet by utilizing a masked language model with pre-trained BERT models. Extensive experiments demonstrate that FUSE can accurately identify multiple semantic facets of the seed set and generate quality entities for each facet.
KW - Multi-facetedness
KW - Set expansion
KW - Word sense disambiguation
UR - http://www.scopus.com/inward/record.url?scp=85103285326&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103285326&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-67664-3_37
DO - 10.1007/978-3-030-67664-3_37
M3 - Conference contribution
AN - SCOPUS:85103285326
SN - 9783030676636
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 617
EP - 632
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2020, Proceedings
A2 - Hutter, Frank
A2 - Kersting, Kristian
A2 - Lijffijt, Jefrey
A2 - Valera, Isabel
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 14 September 2020 through 18 September 2020
ER -