TY - GEN
T1 - Self-supervised Neural Articulated Shape and Appearance Models
AU - Wei, Fangyin
AU - Chabra, Rohan
AU - Ma, Lingni
AU - Lassner, Christoph
AU - Zollhoefer, Michael
AU - Rusinkiewicz, Szymon
AU - Sweeney, Chris
AU - Newcombe, Richard
AU - Slavcheva, Mira
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Learning geometry, motion, and appearance priors of object classes is important for the solution of a large variety of computer vision problems. While the majority of approaches has focused on static objects, dynamic objects, especially with controllable articulation, are less explored. We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of articulated objects given only a set of color images as input. In a self-supervised manner, our novel representation learns shape, appearance, and articulation codes that enable independent control of these semantic dimensions. Our model is trained end-to-end without requiring any articulation annotations. Experiments show that our approach performs well for different joint types, such as revolute and prismatic joints, as well as different combinations of these joints. Compared to state of the art that uses direct 3D supervision and does not output appearance, we recover more faithful geometry and appearance from 2D observations only. In addition, our representation enables a large variety of applications, such as few-shot reconstruction, the generation of novel articulations, and novel view-synthesis. Project page: https://weify627.github.io/nasam/.
AB - Learning geometry, motion, and appearance priors of object classes is important for the solution of a large variety of computer vision problems. While the majority of approaches has focused on static objects, dynamic objects, especially with controllable articulation, are less explored. We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of articulated objects given only a set of color images as input. In a self-supervised manner, our novel representation learns shape, appearance, and articulation codes that enable independent control of these semantic dimensions. Our model is trained end-to-end without requiring any articulation annotations. Experiments show that our approach performs well for different joint types, such as revolute and prismatic joints, as well as different combinations of these joints. Compared to state of the art that uses direct 3D supervision and does not output appearance, we recover more faithful geometry and appearance from 2D observations only. In addition, our representation enables a large variety of applications, such as few-shot reconstruction, the generation of novel articulations, and novel view-synthesis. Project page: https://weify627.github.io/nasam/.
KW - 3D from multi-view and sensors
KW - Image and video synthesis and generation
UR - http://www.scopus.com/inward/record.url?scp=85141800491&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141800491&partnerID=8YFLogxK
U2 - 10.1109/CVPR52688.2022.01536
DO - 10.1109/CVPR52688.2022.01536
M3 - Conference contribution
AN - SCOPUS:85141800491
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 15795
EP - 15805
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PB - IEEE Computer Society
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Y2 - 19 June 2022 through 24 June 2022
ER -