Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the ImageNet hierarchy

Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, Olga Russakovsky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Computer vision technology is being used by many but remains representative of only a few. People have reported misbehavior of computer vision models, including ofensive prediction results and lower performance for underrepresented groups. Current computer vision models are typically developed using datasets consisting of manually annotated images or videos; the data and label distributions in these datasets are critical to the models' behavior. In this paper, we examine ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods. We consider three key factors within the person subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology: (1) the stagnant concept vocabulary of WordNet, (2) the attempt at exhaustive illustration of all categories with images, and (3) the inequality of representation in the images within concepts. We seek to illuminate the root causes of these concerns and take the irst steps to mitigate them constructively.

Original languageEnglish (US)
Title of host publicationFAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency
PublisherAssociation for Computing Machinery, Inc
Pages547-558
Number of pages12
ISBN (Electronic)9781450369367
DOIs
StatePublished - Jan 27 2020
Event3rd ACM Conference on Fairness, Accountability, and Transparency, FAT* 2020 - Barcelona, Spain
Duration: Jan 27 2020Jan 30 2020

Publication series

NameFAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

Conference

Conference3rd ACM Conference on Fairness, Accountability, and Transparency, FAT* 2020
CountrySpain
CityBarcelona
Period1/27/201/30/20

All Science Journal Classification (ASJC) codes

  • Business, Management and Accounting(all)
  • Engineering(all)

Keywords

  • Computer vision
  • Dataset construction
  • Fairness
  • Representative datasets

Fingerprint Dive into the research topics of 'Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the ImageNet hierarchy'. Together they form a unique fingerprint.

  • Cite this

    Yang, K., Qinami, K., Fei-Fei, L., Deng, J., & Russakovsky, O. (2020). Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the ImageNet hierarchy. In FAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 547-558). (FAT* 2020 - Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency). Association for Computing Machinery, Inc. https://doi.org/10.1145/3351095.3375709