TY - GEN
T1 - The more you look, the more you see
T2 - 18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018
AU - Wang, Jingyan
AU - Russakovsky, Olga
AU - Ramanan, Deva
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/5/3
Y1 - 2018/5/3
N2 - Comprehensive object understanding is a central challenge in visual recognition, yet most advances with deep neural networks reason about each aspect in isolation. In this work, we present a unified framework to tackle this broader object understanding problem. We formalize a refinement module that recursively develops understanding across space and semantics-'the more it looks, the more it sees.' More concretely, we cluster the objects within each semantic category into fine-grained subcategories; our recursive model extracts features for each region of interest, recursively predicts the location and the content of the region, and selectively chooses a small subset of the regions to process in the next step. Our model can quickly determine if an object is present, followed by its class ('Is this a person?'), and finally report finegrained predictions ('Is this person standing?'). Our experiments demonstrate the advantages of joint reasoning about spatial layout and fine-grained semantics. On the PASCAL VOC dataset, our proposed model simultaneously achieves strong performance on instance segmentation, part segmentation and keypoint detection in a single efficient pipeline that does not require explicit training for each task. One of the reasons for our strong performance is the ability to naturally leverage highly-engineered architectures, such as Faster-RCNN, within our pipeline. Source code is available at https://github.com/jingyanw/recursive-refinement.
AB - Comprehensive object understanding is a central challenge in visual recognition, yet most advances with deep neural networks reason about each aspect in isolation. In this work, we present a unified framework to tackle this broader object understanding problem. We formalize a refinement module that recursively develops understanding across space and semantics-'the more it looks, the more it sees.' More concretely, we cluster the objects within each semantic category into fine-grained subcategories; our recursive model extracts features for each region of interest, recursively predicts the location and the content of the region, and selectively chooses a small subset of the regions to process in the next step. Our model can quickly determine if an object is present, followed by its class ('Is this a person?'), and finally report finegrained predictions ('Is this person standing?'). Our experiments demonstrate the advantages of joint reasoning about spatial layout and fine-grained semantics. On the PASCAL VOC dataset, our proposed model simultaneously achieves strong performance on instance segmentation, part segmentation and keypoint detection in a single efficient pipeline that does not require explicit training for each task. One of the reasons for our strong performance is the ability to naturally leverage highly-engineered architectures, such as Faster-RCNN, within our pipeline. Source code is available at https://github.com/jingyanw/recursive-refinement.
UR - http://www.scopus.com/inward/record.url?scp=85050973590&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050973590&partnerID=8YFLogxK
U2 - 10.1109/WACV.2018.00199
DO - 10.1109/WACV.2018.00199
M3 - Conference contribution
AN - SCOPUS:85050973590
T3 - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
SP - 1794
EP - 1803
BT - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 March 2018 through 15 March 2018
ER -