Structured matching for phrase localization

Mingzhe Wang, Mahmoud Azab, Noriyuki Kojima, Rada Mihalcea, Jia Deng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

31 Scopus citations

Abstract

In this paper we introduce a new approach to phrase localization: grounding phrases in sentences to image regions. We propose a structured matching of phrases and regions that encourages the semantic relations between phrases to agree with the visual relations between regions. We formulate structured matching as a discrete optimization problem and relax it to a linear program. We use neural networks to embed regions and phrases into vectors, which then define the similarities (matching weights) between regions and phrases. We integrate structured matching with neural networks to enable end-to-end training. Experiments on Flickr30K Entities demonstrate the empirical effectiveness of our approach.

Original languageEnglish (US)
Title of host publicationComputer Vision - 14th European Conference, ECCV 2016, Proceedings
EditorsBastian Leibe, Jiri Matas, Nicu Sebe, Max Welling
PublisherSpringer Verlag
Pages696-711
Number of pages16
ISBN (Print)9783319464831
DOIs
StatePublished - 2016
Externally publishedYes
Event14th European Conference on Computer Vision, ECCV 2016 - Amsterdam, Netherlands
Duration: Oct 8 2016Oct 16 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9912 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th European Conference on Computer Vision, ECCV 2016
Country/TerritoryNetherlands
CityAmsterdam
Period10/8/1610/16/16

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Keywords

  • Language
  • Vision

Fingerprint

Dive into the research topics of 'Structured matching for phrase localization'. Together they form a unique fingerprint.

Cite this