TY - JOUR
T1 - Linear and Machine Learning modelling for spatiotemporal disease predictions
T2 - Force-of-Infection of Chagas disease
AU - Ledien, Julia
AU - Cucunubá, Zulma M.
AU - Parra-Henao, Gabriel
AU - Rodríguez-Monguí, Eliana
AU - Dobson, Andrew P.
AU - Adamo, Susana B.
AU - Basáñez, María Gloria
AU - Nouvellet, Pierre
N1 - Publisher Copyright:
© 2022 Ledien et al.
PY - 2022/7
Y1 - 2022/7
N2 - Background Chagas disease is a long-lasting disease with a prolonged asymptomatic period. Cumula-tive indices of infection such as prevalence do not shed light on the current epidemiological situation, as they integrate infection over long periods. Instead, metrics such as the Force-of-Infection (FoI) provide information about the rate at which susceptible people become infected and permit sharper inference about temporal changes in infection rates. FoI is estimated by fitting (catalytic) models to available age-stratified serological (ground-truth) data. Predictive FoI modelling frameworks are then used to understand spatial and temporal trends indicative of heterogeneity in transmission and changes effected by control interven-tions. Ideally, these frameworks should be able to propagate uncertainty and handle spatio-temporal issues. Methodology/principal findings We compare three methods in their ability to propagate uncertainty and provide reliable estimates of FoI for Chagas disease in Colombia as a case study: two Machine Learning (ML) methods (Boosted Regression Trees (BRT) and Random Forest (RF)), and a Linear Model (LM) framework that we had developed previously. Our analyses show consistent results between the three modelling methods under scrutiny. The predictors (explanatory variables) selected, as well as the location of the most uncertain FoI values, were coherent across frameworks. RF was faster than BRT and LM, and provided estimates with fewer extreme values when extrapolating to areas where no ground-truth data were available. However, BRT and RF were less efficient at propagating uncertainty. Conclusions/significance The choice of FoI predictive models will depend on the objectives of the analysis. ML methods will help characterise the mean behaviour of the estimates, while LM will provide insight into the uncertainty surrounding such estimates. Our approach can be extended to the modelling of FoI patterns in other Chagas disease-endemic countries and to other infectious diseases for which serosurveys are regularly conducted for surveillance.
AB - Background Chagas disease is a long-lasting disease with a prolonged asymptomatic period. Cumula-tive indices of infection such as prevalence do not shed light on the current epidemiological situation, as they integrate infection over long periods. Instead, metrics such as the Force-of-Infection (FoI) provide information about the rate at which susceptible people become infected and permit sharper inference about temporal changes in infection rates. FoI is estimated by fitting (catalytic) models to available age-stratified serological (ground-truth) data. Predictive FoI modelling frameworks are then used to understand spatial and temporal trends indicative of heterogeneity in transmission and changes effected by control interven-tions. Ideally, these frameworks should be able to propagate uncertainty and handle spatio-temporal issues. Methodology/principal findings We compare three methods in their ability to propagate uncertainty and provide reliable estimates of FoI for Chagas disease in Colombia as a case study: two Machine Learning (ML) methods (Boosted Regression Trees (BRT) and Random Forest (RF)), and a Linear Model (LM) framework that we had developed previously. Our analyses show consistent results between the three modelling methods under scrutiny. The predictors (explanatory variables) selected, as well as the location of the most uncertain FoI values, were coherent across frameworks. RF was faster than BRT and LM, and provided estimates with fewer extreme values when extrapolating to areas where no ground-truth data were available. However, BRT and RF were less efficient at propagating uncertainty. Conclusions/significance The choice of FoI predictive models will depend on the objectives of the analysis. ML methods will help characterise the mean behaviour of the estimates, while LM will provide insight into the uncertainty surrounding such estimates. Our approach can be extended to the modelling of FoI patterns in other Chagas disease-endemic countries and to other infectious diseases for which serosurveys are regularly conducted for surveillance.
UR - http://www.scopus.com/inward/record.url?scp=85135420126&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85135420126&partnerID=8YFLogxK
U2 - 10.1371/journal.pntd.0010594
DO - 10.1371/journal.pntd.0010594
M3 - Article
C2 - 35853042
AN - SCOPUS:85135420126
SN - 1935-2727
VL - 16
JO - PLoS neglected tropical diseases
JF - PLoS neglected tropical diseases
IS - 7
M1 - e0010594
ER -