Semantic Relevance Learning for Video-Query Based Video Moment Retrieval

Shuwei Huo, Yuan Zhou, Ruolin Wang, Wei Xiang, Sun Yuan Kung

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The task of video-query based video moment retrieval (VQ-VMR) aims to localize the segment in the reference video, which matches semantically with a short query video. This is a challenging task due to the rapid expansion and massive growth of online video services. With accurate retrieval of the target moment, we propose a new metric to effectively assess the semantic relevance between the query video and segments in the reference video. We also develop a new VQ-VMR framework to discover the intrinsic semantic relevance between a pair of input videos. It comprises two key components: a Fine-grained Feature Interaction (FFI) module and a Semantic Relevance Measurement (SRM) module. Together they can effectively deal with both the spatial and temporal dimensions of videos. First, the FFI module computes the semantic similarity between videos at a local frame level, mainly considering the spatial information in the videos. Subsequently, the SRM module learns the similarity between videos from a global perspective, taking into account the temporal information. We have conducted extensive experiments on two key datasets which demonstrate noticeable improvements of the proposed approach over the state-of-the-art methods.

Original languageEnglish (US)
Pages (from-to)9290-9301
Number of pages12
JournalIEEE Transactions on Multimedia
Volume25
DOIs
StatePublished - 2023

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Electrical and Electronic Engineering
  • Media Technology
  • Computer Science Applications

Keywords

  • Video moment retrieval
  • fine-grained feature interaction
  • semantic relevance measurement
  • video query

Fingerprint

Dive into the research topics of 'Semantic Relevance Learning for Video-Query Based Video Moment Retrieval'. Together they form a unique fingerprint.

Cite this