Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients

Amol A. Verma, Hassan Masoom, Chloe Pou-Prom, Saeha Shin, Michael Guerzhoy, Michael Fralick, Muhammad Mamdani, Fahad Razak

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

Background: Identifying venous thromboembolism (VTE) from large clinical and administrative databases is important for research and quality improvement. Objective: To develop and validate natural language processing (NLP) algorithms to identify VTE from radiology reports among general internal medicine (GIM) inpatients. Methods: This cross-sectional study included GIM hospitalizations between April 1, 2010 and March 31, 2017 at 5 hospitals in Toronto, Ontario, Canada. We developed NLP algorithms to identify pulmonary embolism (PE) and deep venous thrombosis (DVT) from radiologist reports of thoracic computed tomography (CT), extremity compression ultrasound (US), and nuclear ventilation-perfusion (VQ) scans in a training dataset of 1551 hospitalizations. We compared the accuracy of our NLP algorithms, the previously-published “simpleNLP” tool, and administrative discharge diagnosis codes (ICD-10-CA) for PE and DVT to the “gold standard” manual review in a separate random sample of 4000 GIM hospitalizations. Results: Our NLP algorithms were highly accurate for identifying DVT from US, with sensitivity 0.94, positive predictive value (PPV) 0.90, and Area Under the Receiver-Operating-Characteristic Curve (AUC) 0.96; and in identifying PE from CT, with sensitivity 0.91, PPV 0.89, and AUC 0.96. Administrative diagnosis codes and the simple NLP tool were less accurate for DVT (ICD-10-CA sensitivity 0.63, PPV 0.43, AUC 0.81; simpleNLP sensitivity 0.41, PPV 0.36, AUC 0.66) and PE (ICD-10-CA sensitivity 0.83, PPV 0.70, AUC 0.91; simpleNLP sensitivity 0.89, PPV 0.62, AUC 0.92). Conclusions: Administrative diagnosis codes are unreliable in identifying VTE in hospitalized patients. We developed highly accurate NLP algorithms to identify VTE from radiology reports in a multicentre sample and have made the algorithms freely available to the academic community with a user-friendly tool (https://lks-chart.github.io/CHARTextract-docs/08-downloads/rulesets.html#venous-thromboembolism-vte-rulesets)

Original languageEnglish (US)
Pages (from-to)51-58
Number of pages8
JournalThrombosis Research
Volume209
DOIs
StatePublished - Jan 2022
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • Hematology

Keywords

  • Deep vein thrombosis
  • ICD codes
  • Natural language processing
  • Pulmonary embolism
  • Validity

Fingerprint

Dive into the research topics of 'Developing and validating natural language processing algorithms for radiology reports compared to ICD-10 codes for identifying venous thromboembolism in hospitalized medical patients'. Together they form a unique fingerprint.

Cite this