Abstract
The task of video-query based video moment retrieval (VQ-VMR) aims to localize the segment in the reference video, which matches semantically with a short query video. This is a challenging task due to the rapid expansion and massive growth of online video services. With accurate retrieval of the target moment, we propose a new metric to effectively assess the semantic relevance between the query video and segments in the reference video. We also develop a new VQ-VMR framework to discover the intrinsic semantic relevance between a pair of input videos. It comprises two key components: a Fine-grained Feature Interaction (FFI) module and a Semantic Relevance Measurement (SRM) module. Together they can effectively deal with both the spatial and temporal dimensions of videos. First, the FFI module computes the semantic similarity between videos at a local frame level, mainly considering the spatial information in the videos. Subsequently, the SRM module learns the similarity between videos from a global perspective, taking into account the temporal information. We have conducted extensive experiments on two key datasets which demonstrate noticeable improvements of the proposed approach over the state-of-the-art methods.
Original language | English (US) |
---|---|
Pages (from-to) | 9290-9301 |
Number of pages | 12 |
Journal | IEEE Transactions on Multimedia |
Volume | 25 |
DOIs | |
State | Published - 2023 |
All Science Journal Classification (ASJC) codes
- Signal Processing
- Electrical and Electronic Engineering
- Media Technology
- Computer Science Applications
Keywords
- Video moment retrieval
- fine-grained feature interaction
- semantic relevance measurement
- video query