What's in a question: Using visual questions as a form of supervision

Siddha Ganju, Olga Russakovsky, Abhinav Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Collecting fully annotated image datasets is challenging and expensive. Many types of weak supervision have been explored: weak manual annotations, web search results, temporal continuity, ambient sound and others. We focus on one particular unexplored mode: visual questions that are asked about images. The key observation that inspires our work is that the question itself provides useful information about the image (even without the answer being available). For instance, the question “what is the breed of the dog?” informs the AI that the animal in the scene is a dog and that there is only one dog present. We make three contributions: (1) providing an extensive qualitative and quantitative analysis of the information contained in human visual questions, (2) proposing two simple but surprisingly effective modifications to the standard visual question answering models that allow them to make use of weak supervision in the form of unanswered questions associated with images and (3) demonstrating that a simple data augmentation strategy inspired by our insights results in a 7.1% improvement on the standard VQA benchmark.

Original languageEnglish (US)
Title of host publicationProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6422-6431
Number of pages10
ISBN (Electronic)9781538604571
DOIs
StatePublished - Nov 6 2017
Externally publishedYes
Event30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States
Duration: Jul 21 2017Jul 26 2017

Publication series

NameProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Volume2017-January

Other

Other30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
CountryUnited States
CityHonolulu
Period7/21/177/26/17

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Computer Vision and Pattern Recognition

Fingerprint Dive into the research topics of 'What's in a question: Using visual questions as a form of supervision'. Together they form a unique fingerprint.

  • Cite this

    Ganju, S., Russakovsky, O., & Gupta, A. (2017). What's in a question: Using visual questions as a form of supervision. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (pp. 6422-6431). (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017; Vol. 2017-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CVPR.2017.680