TY - GEN
T1 - Towards fairer datasets
T2 - 3rd ACM Conference on Fairness, Accountability, and Transparency, FAT* 2020
AU - Yang, Kaiyu
AU - Qinami, Klint
AU - Fei-Fei, Li
AU - Deng, Jia
AU - Russakovsky, Olga
PY - 2020/1/27
Y1 - 2020/1/27
N2 - Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including ofensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models' behavior. In this paper, we examine ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods. We consider three key factors within the person subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at exhaustive illustration of all categories with images, and (3) the inequality of representation in the images within concepts. We seek to illuminate the root causes of these concerns and take the irst steps to mitigate them constructively.
AB - Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including ofensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models' behavior. In this paper, we examine ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods. We consider three key factors within the person subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at exhaustive illustration of all categories with images, and (3) the inequality of representation in the images within concepts. We seek to illuminate the root causes of these concerns and take the irst steps to mitigate them constructively.
KW - Computer vision
KW - Dataset construction
KW - Fairness
KW - Representative datasets
UR - http://www.scopus.com/inward/record.url?scp=85079661500&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85079661500&partnerID=8YFLogxK
U2 - 10.1145/3351095.3375709
DO - 10.1145/3351095.3375709
M3 - Conference contribution
T3 - FAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency
SP - 547
EP - 558
BT - FAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency
PB - Association for Computing Machinery, Inc
Y2 - 27 January 2020 through 30 January 2020
ER -