It is a standard practice in regression analyses to allow for clustering in the error covariance matrix if the explanatory variable of interest varies at a more aggregate level (e.g., the state level) than the units of observation (e.g., individuals). Often, however, the structure of the error covariance matrix is more complex, with correlations not vanishing for units in different clusters. Here, we explore the implications of such correlations for the actual and estimated precision of least squares estimators. Our main theoretical result is that with equal-sized clusters, if the covariate of interest is randomly assigned at the cluster level, only accounting for nonzero covariances at the cluster level, and ignoring correlations between clusters as well as differences in within-cluster correlations, leads to valid confidence intervals. However, in the absence of random assignment of the covariates, ignoring general correlation structures may lead to biases in standard errors. We illustrate our findings using the 5% public-use census data. Based on these results, we recommend that researchers, as a matter of routine, explore the extent of spatial correlations in explanatory variables beyond state-level clustering.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
- Clustered standard errors
- Confidence intervals
- Random assignment