Spatial aggregation effects on the performance of machine learning metamodels for predicting transit time to baseflow

Mario A. Soriano, Reed Maxwell

Research output: Contribution to journalArticlepeer-review


Water transit time is the duration between the entry and exit of a parcel of water across a hydrologic system. It is a fundamental characteristic that links hydrologic transport, biogeochemical processing, and water quality, and it has broad implications for resource vulnerability and sustainability. Physically based models can accurately describe transit time distributions but require significant computational resources when applied to large regions at high resolutions. In this study, we evaluate the potential of machine learning metamodels to emulate physically based models for computationally efficient prediction of key metrics from transit time distributions. Transit times are computed from a continental scale, integrated hydrologic model coupled with particle tracking. The metamodeling approach is illustrated in the 280,000-sq km Upper Colorado River Basin, USA, a principal headwater basin that is under multiple stresses, including resource overallocation, water quality threats, and climate change impacts. We evaluate the effects of using different types of spatial aggregation in the metamodels, including regular grids, hydrologic units, and upstream watersheds. We found that metamodels using upstream watershed aggregation exhibited the best overall performance across our target predictions. Errors were more pronounced in metamodels that employed smaller spatial aggregation units compared to larger units, suggesting that additional predictors that capture the heterogeneity of topographic, climatic, and geologic properties are needed at these scales. We also found that predictor importance and input-output relations were remarkably consistent across spatial aggregation type and agree with previous findings documented from physically based models and tracer-based studies. Our results show the feasibility of developing machine learning metamodels for predicting transit times and demonstrate the necessity of multiscale analyses to probe the robustness of the findings.

Original languageEnglish (US)
Article number115002
JournalEnvironmental Research Communications
Issue number11
StatePublished - Nov 1 2023

All Science Journal Classification (ASJC) codes

  • Food Science
  • General Environmental Science
  • Agricultural and Biological Sciences (miscellaneous)
  • Geology
  • Earth-Surface Processes
  • Atmospheric Science


  • hydrologic modeling
  • metamodels
  • random forest
  • residence time
  • scale
  • transit time distribution


Dive into the research topics of 'Spatial aggregation effects on the performance of machine learning metamodels for predicting transit time to baseflow'. Together they form a unique fingerprint.

Cite this