LEARNING TO REASON OVER VISUAL OBJECTS

Shanka Subhra Mondal, Taylor W. Webb, Jonathan D. Cohen

Research output: Contribution to conferencePaperpeer-review

11 Scopus citations

Abstract

A core component of human intelligence is the ability to identify abstract patterns inherent in complex, high-dimensional perceptual data, as exemplified by visual reasoning tasks such as Raven's Progressive Matrices (RPM).Motivated by the goal of designing AI systems with this capacity, recent work has focused on evaluating whether neural networks can learn to solve RPM-like problems.Previous work has generally found that strong performance on these problems requires the incorporation of inductive biases that are specific to the RPM problem format, raising the question of whether such models might be more broadly useful.Here, we investigated the extent to which a general-purpose mechanism for processing visual scenes in terms of objects might help promote abstract visual reasoning.We found that a simple model, consisting only of an object-centric encoder and a transformer reasoning module, achieved state-of-the-art results on both of two challenging RPM-like benchmarks (PGM and I-RAVEN), as well as a novel benchmark with greater visual complexity (CLEVR-Matrices).These results suggest that an inductive bias for object-centric processing may be a key component of abstract visual reasoning, obviating the need for problem-specific inductive biases.

Original languageEnglish (US)
StatePublished - 2023
Event11th International Conference on Learning Representations, ICLR 2023 - Kigali, Rwanda
Duration: May 1 2023May 5 2023

Conference

Conference11th International Conference on Learning Representations, ICLR 2023
Country/TerritoryRwanda
CityKigali
Period5/1/235/5/23

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Computer Science Applications
  • Education
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'LEARNING TO REASON OVER VISUAL OBJECTS'. Together they form a unique fingerprint.

Cite this