TY - JOUR
T1 - Optimal Data-Driven Regression Discontinuity Plots
AU - Calonico, Sebastian
AU - Cattaneo, Matias D.
AU - Titiunik, Rocío
N1 - Funding Information:
Sebastian Calonico is Assistant Professor, Department of Economics, University of Miami, Coral Gables, FL 33124 (E-mail: scalonico@bus.miami.edu). Matias D. Cattaneo is Associate Professor, Department of Economics, University of Michigan, Ann Arbor, MI 48109 (E-mail: cattaneo@umich.edu). Rocío Titiunik is Assistant Professor, Department of Political Science, University of Michigan, Ann Arbor, MI 48109 (E-mail: titiunik@umich.edu). This article has benefited from the insightful suggestions of the co-editor, David Ruppert, an associate editor, and three reviewers. The authors also thank Andreas Hagemann, Guido Imbens, Michael Jansson, Zhuan Pei, and Andres Santos for their comments. Financial support from the National Science Foundation (SES 1357561) is gratefully acknowledged.
Publisher Copyright:
© 2015, © American Statistical Association.
PY - 2015/10/2
Y1 - 2015/10/2
N2 - Exploratory data analysis plays a central role in applied statistics and econometrics. In the popular regression-discontinuity (RD) design, the use of graphical analysis has been strongly advocated because it provides both easy presentation and transparent validation of the design. RD plots are nowadays widely used in applications, despite its formal properties being unknown: these plots are typically presented employing ad hoc choices of tuning parameters, which makes these procedures less automatic and more subjective. In this article, we formally study the most common RD plot based on an evenly spaced binning of the data, and propose several (optimal) data-driven choices for the number of bins depending on the goal of the researcher. These RD plots are constructed either to approximate the underlying unknown regression functions without imposing smoothness in the estimator, or to approximate the underlying variability of the raw data while smoothing out the otherwise uninformative scatterplot of the data. In addition, we introduce an alternative RD plot based on quantile spaced binning, study its formal properties, and propose similar (optimal) data-driven choices for the number of bins. The main proposed data-driven selectors employ spacings estimators, which are simple and easy to implement in applications because they do not require additional choices of tuning parameters. Altogether, our results offer an array of alternative RD plots that are objective and automatic when implemented, providing a reliable benchmark for graphical analysis in RD designs. We illustrate the performance of our automatic RD plots using several empirical examples and a Monte Carlo study. All results are readily available in R and STATA using the software packages described in Calonico, Cattaneo, and Titiunik. Supplementary materials for this article are available online.
AB - Exploratory data analysis plays a central role in applied statistics and econometrics. In the popular regression-discontinuity (RD) design, the use of graphical analysis has been strongly advocated because it provides both easy presentation and transparent validation of the design. RD plots are nowadays widely used in applications, despite its formal properties being unknown: these plots are typically presented employing ad hoc choices of tuning parameters, which makes these procedures less automatic and more subjective. In this article, we formally study the most common RD plot based on an evenly spaced binning of the data, and propose several (optimal) data-driven choices for the number of bins depending on the goal of the researcher. These RD plots are constructed either to approximate the underlying unknown regression functions without imposing smoothness in the estimator, or to approximate the underlying variability of the raw data while smoothing out the otherwise uninformative scatterplot of the data. In addition, we introduce an alternative RD plot based on quantile spaced binning, study its formal properties, and propose similar (optimal) data-driven choices for the number of bins. The main proposed data-driven selectors employ spacings estimators, which are simple and easy to implement in applications because they do not require additional choices of tuning parameters. Altogether, our results offer an array of alternative RD plots that are objective and automatic when implemented, providing a reliable benchmark for graphical analysis in RD designs. We illustrate the performance of our automatic RD plots using several empirical examples and a Monte Carlo study. All results are readily available in R and STATA using the software packages described in Calonico, Cattaneo, and Titiunik. Supplementary materials for this article are available online.
KW - Binning
KW - Partitioning
KW - RD plots
KW - Tuning parameter selection
UR - http://www.scopus.com/inward/record.url?scp=84938914336&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84938914336&partnerID=8YFLogxK
U2 - 10.1080/01621459.2015.1017578
DO - 10.1080/01621459.2015.1017578
M3 - Article
AN - SCOPUS:84938914336
SN - 0162-1459
VL - 110
SP - 1753
EP - 1769
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 512
ER -