FUSE: Multi-faceted Set Expansion by Coherent Clustering of Skip-Grams

Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Set expansion aims to expand a small set of seed entities into a complete set of relevant entities. Most existing approaches assume the input seed set is unambiguous and completely ignore the multi-faceted semantics of seed entities. As a result, given the seed set {“Canon”, “Sony”, “Nikon”}, previous models return one mixed set of entities that are either Camera Brands or Japanese Companies. In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet. We propose an unsupervised framework, FUSE, which consists of three major components: (1) facet discovery module: identifies all semantic facets of each seed entity by extracting and clustering its skip-grams, and (2) facet fusion module: discovers shared semantic facets of the entire seed set by an optimization formulation, and (3) entity expansion module: expands each semantic facet by utilizing a masked language model with pre-trained BERT models. Extensive experiments demonstrate that FUSE can accurately identify multiple semantic facets of the seed set and generate quality entities for each facet.

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2020, Proceedings
EditorsFrank Hutter, Kristian Kersting, Jefrey Lijffijt, Isabel Valera
PublisherSpringer Science and Business Media Deutschland GmbH
Pages617-632
Number of pages16
ISBN (Print)9783030676636
DOIs
StatePublished - 2021
Externally publishedYes
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2020 - Virtual, Online
Duration: Sep 14 2020Sep 18 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12459 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2020
CityVirtual, Online
Period9/14/209/18/20

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Keywords

  • Multi-facetedness
  • Set expansion
  • Word sense disambiguation

Fingerprint

Dive into the research topics of 'FUSE: Multi-faceted Set Expansion by Coherent Clustering of Skip-Grams'. Together they form a unique fingerprint.

Cite this