TY - CONF
T1 - Fine-grained car detection for visual census estimation
AU - Gebru, Timnit
AU - Krause, Jonathan
AU - Wang, Yilun
AU - Chen, Duyun
AU - Deng, Jia
AU - Li, Fei Fei
N1 - Funding Information:
We thank Stefano Ermon, Neal Jean, Oliver Groth, Serena Yeung, Alexandre Alahi, Kevin Tang and everyone in the Stanford Vision Lab for valuable feedback. This research is partially supported by an NSF grant (IIS-1115493), the Stanford DARE fellowship (to T.G.) and by NVIDIA (through donated GPUs).
Publisher Copyright:
Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2017
Y1 - 2017
N2 - Targeted socio-economic policies require an accurate understanding of a country's demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine learningdriven approaches are cheaper and faster - with the potential ability to detect trends in close to real time. In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data. We first detect cars in 50 million images across 200 of the largest US cities and train a model to predict demographic attributes using the detected cars. To facilitate our work, we have collected the largest and most challenging fine-grained dataset reported to date consisting of over 2600 classes of cars comprised of images from Google Street View and other web sources, classified by car experts to account for even the most subtle of visual differences. We use this data to construct the largest scale fine-grained detection system reported to date. Our prediction results correlate well with ground truth income data (r=0.82), Massachusetts department of vehicle registration, and sources investigating crime rates, income segregation, per capita carbon emission, and other market research. Finally, we learn interesting relationships between cars and neighbourhoods allowing us to perform the first large scale sociological analysis of cities using computer vision techniques.
AB - Targeted socio-economic policies require an accurate understanding of a country's demographic makeup. To that end, the United States spends more than 1 billion dollars a year gathering census data such as race, gender, education, occupation and unemployment rates. Compared to the traditional method of collecting surveys across many years which is costly and labor intensive, data-driven, machine learningdriven approaches are cheaper and faster - with the potential ability to detect trends in close to real time. In this work, we leverage the ubiquity of Google Street View images and develop a computer vision pipeline to predict income, per capita carbon emission, crime rates and other city attributes from a single source of publicly available visual data. We first detect cars in 50 million images across 200 of the largest US cities and train a model to predict demographic attributes using the detected cars. To facilitate our work, we have collected the largest and most challenging fine-grained dataset reported to date consisting of over 2600 classes of cars comprised of images from Google Street View and other web sources, classified by car experts to account for even the most subtle of visual differences. We use this data to construct the largest scale fine-grained detection system reported to date. Our prediction results correlate well with ground truth income data (r=0.82), Massachusetts department of vehicle registration, and sources investigating crime rates, income segregation, per capita carbon emission, and other market research. Finally, we learn interesting relationships between cars and neighbourhoods allowing us to perform the first large scale sociological analysis of cities using computer vision techniques.
UR - http://www.scopus.com/inward/record.url?scp=85030464850&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85030464850&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85030464850
SP - 4502
EP - 4508
T2 - 31st AAAI Conference on Artificial Intelligence, AAAI 2017
Y2 - 4 February 2017 through 10 February 2017
ER -