TY - GEN
T1 - Hearing Touch
T2 - 2024 IEEE International Conference on Robotics and Automation, ICRA 2024
AU - Mejia, Jared
AU - Dean, Victoria
AU - Hellebrekers, Tess
AU - Gupta, Abhinav
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Although pre-training on a large amount of data is beneficial for robot learning, current paradigms only perform large-scale pretraining for visual representations, whereas representations for other modalities are trained from scratch. In contrast to the abundance of visual data, it is unclear what relevant internet-scale data may be used for pretraining other modalities such as tactile sensing. Such pretraining becomes increasingly crucial in the low-data regimes common in robotics applications. In this paper, we address this gap by using contact microphones as an alternative tactile sensor. Our key insight is that contact microphones capture inherently audio-based information, allowing us to leverage large-scale audio-visual pretraining to obtain representations that boost the performance of robotic manipulation. To the best of our knowledge, our method is the first approach leveraging large-scale multisensory pre-training for robotic manipulation. For supplementary information including videos of real robot experiments, please see https://sites.google.com/view/hearing-touch.
AB - Although pre-training on a large amount of data is beneficial for robot learning, current paradigms only perform large-scale pretraining for visual representations, whereas representations for other modalities are trained from scratch. In contrast to the abundance of visual data, it is unclear what relevant internet-scale data may be used for pretraining other modalities such as tactile sensing. Such pretraining becomes increasingly crucial in the low-data regimes common in robotics applications. In this paper, we address this gap by using contact microphones as an alternative tactile sensor. Our key insight is that contact microphones capture inherently audio-based information, allowing us to leverage large-scale audio-visual pretraining to obtain representations that boost the performance of robotic manipulation. To the best of our knowledge, our method is the first approach leveraging large-scale multisensory pre-training for robotic manipulation. For supplementary information including videos of real robot experiments, please see https://sites.google.com/view/hearing-touch.
UR - https://www.scopus.com/pages/publications/85198640739
UR - https://www.scopus.com/inward/citedby.url?scp=85198640739&partnerID=8YFLogxK
U2 - 10.1109/ICRA57147.2024.10611305
DO - 10.1109/ICRA57147.2024.10611305
M3 - Conference contribution
AN - SCOPUS:85198640739
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 6912
EP - 6919
BT - 2024 IEEE International Conference on Robotics and Automation, ICRA 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 May 2024 through 17 May 2024
ER -