Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation

Jared Mejia, Victoria Dean, Tess Hellebrekers, Abhinav Gupta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations

Abstract

Although pre-training on a large amount of data is beneficial for robot learning, current paradigms only perform large-scale pretraining for visual representations, whereas representations for other modalities are trained from scratch. In contrast to the abundance of visual data, it is unclear what relevant internet-scale data may be used for pretraining other modalities such as tactile sensing. Such pretraining becomes increasingly crucial in the low-data regimes common in robotics applications. In this paper, we address this gap by using contact microphones as an alternative tactile sensor. Our key insight is that contact microphones capture inherently audio-based information, allowing us to leverage large-scale audio-visual pretraining to obtain representations that boost the performance of robotic manipulation. To the best of our knowledge, our method is the first approach leveraging large-scale multisensory pre-training for robotic manipulation. For supplementary information including videos of real robot experiments, please see https://sites.google.com/view/hearing-touch.

Original languageEnglish (US)
Title of host publication2024 IEEE International Conference on Robotics and Automation, ICRA 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6912-6919
Number of pages8
ISBN (Electronic)9798350384574
DOIs
StatePublished - 2024
Externally publishedYes
Event2024 IEEE International Conference on Robotics and Automation, ICRA 2024 - Yokohama, Japan
Duration: May 13 2024May 17 2024

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)1050-4729

Conference

Conference2024 IEEE International Conference on Robotics and Automation, ICRA 2024
Country/TerritoryJapan
CityYokohama
Period5/13/245/17/24

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Hearing Touch: Audio-Visual Pretraining for Contact-Rich Manipulation'. Together they form a unique fingerprint.

Cite this