TY - GEN
T1 - Text2Shape
T2 - 14th Asian Conference on Computer Vision, ACCV 2018
AU - Chen, Kevin
AU - Choy, Christopher B.
AU - Savva, Manolis
AU - Chang, Angel X.
AU - Funkhouser, Thomas
AU - Savarese, Silvio
N1 - Funding Information:
Acknowledgments. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE – 1147470. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. This work is supported by Google, Intel, and with the support of the Technical University of Munich–Institute for Advanced Study, funded by the German Excellence Initiative and the European Union Seventh Framework Programme under grant agreement no. 291763.
Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - We present a method for generating colored 3D shapes from natural language. To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes. Our model combines and extends learning by association and metric learning approaches to learn implicit cross-modal connections, and produces a joint representation that captures the many-to-many relations between language and physical properties of 3D shapes such as color and shape. To evaluate our approach, we collect a large dataset of natural language descriptions for physical 3D objects in the ShapeNet dataset. With this learned joint embedding we demonstrate text-to-shape retrieval that outperforms baseline approaches. Using our embeddings with a novel conditional Wasserstein GAN framework, we generate colored 3D shapes from text. Our method is the first to connect natural language text with realistic 3D objects exhibiting rich variations in color, texture, and shape detail.
AB - We present a method for generating colored 3D shapes from natural language. To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes. Our model combines and extends learning by association and metric learning approaches to learn implicit cross-modal connections, and produces a joint representation that captures the many-to-many relations between language and physical properties of 3D shapes such as color and shape. To evaluate our approach, we collect a large dataset of natural language descriptions for physical 3D objects in the ShapeNet dataset. With this learned joint embedding we demonstrate text-to-shape retrieval that outperforms baseline approaches. Using our embeddings with a novel conditional Wasserstein GAN framework, we generate colored 3D shapes from text. Our method is the first to connect natural language text with realistic 3D objects exhibiting rich variations in color, texture, and shape detail.
UR - http://www.scopus.com/inward/record.url?scp=85067243782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85067243782&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-20893-6_7
DO - 10.1007/978-3-030-20893-6_7
M3 - Conference contribution
AN - SCOPUS:85067243782
SN - 9783030208929
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 100
EP - 116
BT - Computer Vision - ACCV 2018 - 14th Asian Conference on Computer Vision, Revised Selected Papers
A2 - Li, Hongdong
A2 - Jawahar, C.V.
A2 - Schindler, Konrad
A2 - Mori, Greg
PB - Springer Verlag
Y2 - 2 December 2018 through 6 December 2018
ER -