TY - JOUR
T1 - Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients
AU - Verma, Amol A.
AU - Masoom, Hassan
AU - Pou-Prom, Chloe
AU - Shin, Saeha
AU - Guerzhoy, Michael
AU - Fralick, Michael
AU - Mamdani, Muhammad
AU - Razak, Fahad
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2022/1
Y1 - 2022/1
N2 - Background: Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement. Objective: To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients. Methods: This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published “simpleNLP” tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the “gold standard” manual review in a separate random sample of 4000 GIM hospitalizations. Results: Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92). Conclusions: Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets)
AB - Background: Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement. Objective: To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients. Methods: This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published “simpleNLP” tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the “gold standard” manual review in a separate random sample of 4000 GIM hospitalizations. Results: Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92). Conclusions: Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets)
KW - Deep vein thrombosis
KW - ICD codes
KW - Natural language processing
KW - Pulmonary embolism
KW - Validity
UR - http://www.scopus.com/inward/record.url?scp=85122496056&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122496056&partnerID=8YFLogxK
U2 - 10.1016/j.thromres.2021.11.020
DO - 10.1016/j.thromres.2021.11.020
M3 - Article
C2 - 34871982
AN - SCOPUS:85122496056
SN - 0049-3848
VL - 209
SP - 51
EP - 58
JO - Thrombosis Research
JF - Thrombosis Research
ER -