TY - JOUR
T1 - Weakly-supervised content-based video moment retrieval using low-rank video representation
AU - Huo, Shuwei
AU - Zhou, Yuan
AU - Xiang, Wei
AU - Kung, Sun Yuan
N1 - Funding Information:
This work was supported by the National Key Research and Development Program of China ( 2020YFC1523204 ) and National Natural Science Foundation of China ( 62171320 and U2006211 ).
Publisher Copyright:
© 2023 The Author(s)
PY - 2023/10/9
Y1 - 2023/10/9
N2 - Content-based video moment retrieval (CVMR) aims to localize a successive sequence of frames in an untrimmed reference video, called target moment, that is semantically corresponding to a given query video. Current state-of-the-art CVMR methods are mainly developed using frame-level annotation, which is often quite expensive to collect. In this paper, we aim to develop a weakly-supervised CVMR method, which uses coarse-grained video-level annotations during training. Under weak supervision, video localizers require more discriminative frame-level video features. To achieve this goal, we proposed a novel prior, termed low-rank prior, based on an observation that the frame-level feature of a video should have low-rank properties. We demonstrated that the low-rank features are more discriminative and are beneficial to accurately localize the action boundaries. To produce a low-rank feature, we designed a low-rank feature reconstruction (LFR) operator. A new differentiable matrix decomposition approach is proposed to generate the low-rank reconstruction of the input matrix, meanwhile ensuring that the matrix decomposition process is differentiable. Based on the LFR, we developed a new weakly-supervised CVMR model which produces low-rank video representation and performs semantic consistency measures to discover the semantically matched segment in the reference video to the query video. Extensive experiments demonstrate that our method outperforms state-of-the-art weakly-supervised methods consistently and even achieves competing performance to fully-supervised baselines.
AB - Content-based video moment retrieval (CVMR) aims to localize a successive sequence of frames in an untrimmed reference video, called target moment, that is semantically corresponding to a given query video. Current state-of-the-art CVMR methods are mainly developed using frame-level annotation, which is often quite expensive to collect. In this paper, we aim to develop a weakly-supervised CVMR method, which uses coarse-grained video-level annotations during training. Under weak supervision, video localizers require more discriminative frame-level video features. To achieve this goal, we proposed a novel prior, termed low-rank prior, based on an observation that the frame-level feature of a video should have low-rank properties. We demonstrated that the low-rank features are more discriminative and are beneficial to accurately localize the action boundaries. To produce a low-rank feature, we designed a low-rank feature reconstruction (LFR) operator. A new differentiable matrix decomposition approach is proposed to generate the low-rank reconstruction of the input matrix, meanwhile ensuring that the matrix decomposition process is differentiable. Based on the LFR, we developed a new weakly-supervised CVMR model which produces low-rank video representation and performs semantic consistency measures to discover the semantically matched segment in the reference video to the query video. Extensive experiments demonstrate that our method outperforms state-of-the-art weakly-supervised methods consistently and even achieves competing performance to fully-supervised baselines.
KW - Content-based video moment retrieval
KW - Low-rank prior
KW - Weakly supervised
UR - http://www.scopus.com/inward/record.url?scp=85166466015&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85166466015&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2023.110776
DO - 10.1016/j.knosys.2023.110776
M3 - Article
AN - SCOPUS:85166466015
SN - 0950-7051
VL - 277
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 110776
ER -