What Actions are Needed for Understanding Human Actions in Videos?

Gunnar A. Sigurdsson, Olga Russakovsky, Abhinav Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

77 Scopus citations

Abstract

What is the right way to reason about human activities? What directions forward are most promising? In this work, we analyze the current state of human activity understanding in videos. The goal of this paper is to examine datasets, evaluation metrics, algorithms, and potential future directions. We look at the qualitative attributes that define activities such as pose variability, brevity, and density. The experiments consider multiple state-of-the-art algorithms and multiple datasets. The results demonstrate that while there is inherent ambiguity in the temporal extent of activities, current datasets still permit effective benchmarking. We discover that fine-grained understanding of objects and pose when combined with temporal reasoning is likely to yield substantial improvements in algorithmic accuracy. We present the many kinds of information that will be needed to achieve substantial gains in activity understanding: objects, verbs, intent, and sequential reasoning. The software and additional information will be made available to provide other researchers detailed diagnostics to understand their own algorithms.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Computer Vision, ICCV 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2156-2165
Number of pages10
ISBN (Electronic)9781538610329
DOIs
StatePublished - Dec 22 2017
Externally publishedYes
Event16th IEEE International Conference on Computer Vision, ICCV 2017 - Venice, Italy
Duration: Oct 22 2017Oct 29 2017

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2017-October
ISSN (Print)1550-5499

Other

Other16th IEEE International Conference on Computer Vision, ICCV 2017
Country/TerritoryItaly
CityVenice
Period10/22/1710/29/17

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'What Actions are Needed for Understanding Human Actions in Videos?'. Together they form a unique fingerprint.

Cite this