TY - GEN
T1 - Efficient, Self-Supervised Human Pose Estimation with Inductive Prior Tuning
AU - Yoo, Nobline
AU - Russakovsky, Olga
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The goal of 2D human pose estimation (HPE) is to localize anatomical landmarks, given an image of a person in a pose. SOTA techniques make use of thousands of labeled figures (finetuning transformers or training deep CNNs), acquired using labor-intensive crowdsourcing. On the other hand, self-supervised methods re-frame the HPE task as a reconstruction problem, enabling them to leverage the vast amount of unlabeled visual data, though at the present cost of accuracy. In this work, we explore ways to improve self-supervised HPE. We (1) analyze the relationship between reconstruction quality and pose estimation accuracy, (2) develop a model pipeline that outperforms the baseline which inspired our work, using less than one-third the amount of training data, and (3) offer a new metric suitable for self-supervised settings that measures the consistency of predicted body part length proportions. We show that a combination of well-engineered reconstruction losses and inductive priors can help coordinate pose learning alongside reconstruction in a self-supervised paradigm.
AB - The goal of 2D human pose estimation (HPE) is to localize anatomical landmarks, given an image of a person in a pose. SOTA techniques make use of thousands of labeled figures (finetuning transformers or training deep CNNs), acquired using labor-intensive crowdsourcing. On the other hand, self-supervised methods re-frame the HPE task as a reconstruction problem, enabling them to leverage the vast amount of unlabeled visual data, though at the present cost of accuracy. In this work, we explore ways to improve self-supervised HPE. We (1) analyze the relationship between reconstruction quality and pose estimation accuracy, (2) develop a model pipeline that outperforms the baseline which inspired our work, using less than one-third the amount of training data, and (3) offer a new metric suitable for self-supervised settings that measures the consistency of predicted body part length proportions. We show that a combination of well-engineered reconstruction losses and inductive priors can help coordinate pose learning alongside reconstruction in a self-supervised paradigm.
KW - Computer vision
KW - Human pose estimation
KW - Inductive prior
KW - Self supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85182952325&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182952325&partnerID=8YFLogxK
U2 - 10.1109/ICCVW60793.2023.00351
DO - 10.1109/ICCVW60793.2023.00351
M3 - Conference contribution
AN - SCOPUS:85182952325
T3 - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
SP - 3263
EP - 3272
BT - Proceedings - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2023
Y2 - 2 October 2023 through 6 October 2023
ER -