Temporal action localization by structured maximal sums

Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Scopus citations

Abstract

We address the problem of temporal action localization in videos. We pose action localization as a structured prediction over arbitrary-length temporal windows, where each window is scored as the sum of frame-wise classification scores. Additionally, our model classifies the start, middle, and end of each action as separate components, allowing our system to explicitly model each action's temporal evolution and take advantage of informative temporal dependencies present in this structure. In this framework, we localize actions by searching for the structured maximal sum, a problem for which we develop a novel, provablyefficient algorithmic solution. The frame-wise classification scores are computed using features from a deep Convolutional Neural Network (CNN), which are trained end-toend to directly optimize for a novel structured objective. We evaluate our system on the THUMOS '14 action detection benchmark and achieve competitive performance.

Original languageEnglish (US)
Title of host publicationProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3215-3223
Number of pages9
ISBN (Electronic)9781538604571
DOIs
StatePublished - Nov 6 2017
Externally publishedYes
Event30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States
Duration: Jul 21 2017Jul 26 2017

Publication series

NameProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Volume2017-January

Other

Other30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
CountryUnited States
CityHonolulu
Period7/21/177/26/17

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'Temporal action localization by structured maximal sums'. Together they form a unique fingerprint.

  • Cite this

    Yuan, Z., Stroud, J. C., Lu, T., & Deng, J. (2017). Temporal action localization by structured maximal sums. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (pp. 3215-3223). (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017; Vol. 2017-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CVPR.2017.342