Winning Models for Grade Point Average, Grit, and Layoff in the Fragile Families Challenge

  • Daniel E. Rigobon
  • , Eaman Jahani
  • , Yoshihiko Suhara
  • , Khaled Alghoneim
  • , Abdulaziz Alghunaim
  • , Alex Sandy Pentland
  • , Abdullah Almaatouq

Research output: Contribution to journalArticlepeer-review

Abstract

In this article, the authors discuss and analyze their approach to the Fragile Families Challenge. The data consisted of more than 12,000 features (covariates) about the children and their parents, schools, and overall environments from birth to age 9. The authors’ modular and collaborative approach parallelized prediction tasks and relied primarily on existing data science techniques, including (1) data preprocessing: elimination of low variance features, imputation of missing data, and construction of composite features; (2) feature selection through univariate mutual information and extraction of nonzero least absolute shrinkage and selection operator coefficients; (3) three machine learning models: random forest, elastic net, and gradient-boosted trees; and finally (4) prediction aggregation according to performance. The top-performing submissions produced winning out-of-sample predictions for three outcomes: grade point average, grit, and layoff. However, predictions were at most 20 percent better than a baseline that predicted the mean value of the training data for each outcome.

Original languageEnglish (US)
Article number2378023118820418
JournalSocius
Volume5
DOIs
StatePublished - 2019

All Science Journal Classification (ASJC) codes

  • General Social Sciences

Keywords

  • data science
  • Fragile Families Challenge
  • machine learning

Fingerprint

Dive into the research topics of 'Winning Models for Grade Point Average, Grit, and Layoff in the Fragile Families Challenge'. Together they form a unique fingerprint.

Cite this